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Capsule  Description 


This  module  examines  the  techniques,  assessment, 
and  management  of  unit  analysis  and  testing.  Anal¬ 
ysis  strategies  are  classified  according  to  the  view 
they  take  of  the  software:  textual,  syntactic,  control 
flow,  data  flow,  computation  flow,  or  functional. 
Testing  strategies  are  categorized  according  to 
whether  their  coverage  goal  is  specification-oriented, 
implementation-oriented,  error-oriented,  or  a  combi¬ 
nation  of  these.  Mastery  of  the  material  in  this  mod¬ 
ule  allows  the  software  engineer  to  define,  conduct, 
and  evaluate  unit  analyses  and  tests  and  to  assess 
new  techniques  proposed  in  the  literature. 


A  Word  About  This  Version _ 

This  version  of  Unit  Analysis  and  Testing  contains 
many  changes  that  will  be  noticed  by  readers  of  ear¬ 
lier  versions,  including  the  relatively  minor  title 
change  from  Umt  Testing  and  Analysis.  Lionel 
Deimel  functioned  as  an  active  technical  editor  of 
previous  editions,  and  his  status  has  been  upgraded 
to  that  of  coauthor. 

The  scope  and  goals  of  this  curriculum  module  are 
largely  unchanged  (see  Philosophy  below)  but  the 
material  has  been  updated  and  reorganized  to  reflect 
our  rapidly  expanding  understanding  of  analysis  and 
testing  techniques.  In  particular,  we  have  recog¬ 
nized  that  each  testing  technique  requires  determi¬ 
nation  of  software  characteristics,  the  discovery  of 
which  we  have  named  analysis.^  Since  one  testing 
technique  may  rely  on  several  analysis  techniques, 
and  one  analysis  technique  may  support  several  test- 


’in  the  earlier  versions  of  this  module  “analysis"  referred  to 
verification  techniques  that  did  not  require  execution  of  the 
software.  We  feel  the  narrowed  deHnition  used  here  is  more  in 
keeping  with  the  conventiorud  usage  of  the  word. 


ing  (and  other  verification)  techniques,  analysis 
techniques  are  discussed  separately  from  the  testing 
techniques  they  support. 

We  believe  the  current  organization  provides  greater 
insight  into  the  nature  of  the  techniques  described 
and  their  relationship  to  one  another.  We  have  also 
been  more  deliberate  about  definitions  and  have  in¬ 
troduced  them  within  the  context  of  a  model  of  veri¬ 
fication. 

At  every  turn,  we  have  had  to  resist  turning  the  mod¬ 
ule  into  a  monogr^h.  The  goal,  of  course,  was  to 
keep  this  work  an  approachable  outline  with  bibliog¬ 
raphy  and  teaching  suggestions.  Achieving  this  goal 
meant  deleting  related  material,  paring  the  size  of 
the  biUiography,  not  explaining  certain  concepts  as 
much  as  we  would  like,  and  not  relating  concepts  to 
one  another  at  greater  length.  The  authors  take  full 
responsibility  for  what  may  seem,  in  places,  a  loose¬ 
ness  of  integration.  We  believe  the  casual  reader 
will  appreciate  our  brevity,  however,  and  the  careful 
reader  will  receive  sufficient  guidance  to  fill  in  any 
gaps. 

The  emphasis  here  is  on  testing,  and  the  reader 
should  be  warned  that  the  treatment  of  analysis  is 
not  comprehensive,  but  is  meant  to  provide  the  nec¬ 
essary  background  for  the  discussion  of  the  main 
topic.  Perhaps  in  the  future,  unit  analysis  will 
receive  the  attention  it  deserves  m  its  own  curricu¬ 
lum  module. 


Philosophy 


Program  testing  is  the  most  practiced  means  of  veri¬ 
fying  that  a  program  possesses  the  features  required 
by  its  specification.  Testing  is  a  dynamic  approach 
to  verification  in  which  software  is  executed  with 
test  data  to  assess  the  presence  of  required /camrM. 
The  inferences  involved  in  this  assessment  are 
surprisingly  complex.  Testing  employs  analysis  to 
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determine  software  characteristics,  which  are  then 
used  to  evaluate  whether  features  are  present  or  not 

Many  verification  techniques  have  become  estab¬ 
lished  technologies  with  their  own  substantial  litera¬ 
ture.  So  that  they  may  be  given  adequate  treatment 
elsewhere,  these  techniques  have  been  placed  out¬ 
side  the  scope  of  this  module.  Included  among  these 
techniques  art  proof  of  correctness,  safety  verifica¬ 
tion,  and  the  more  open-ended  verification  proce¬ 
dures  represented  by  code  inspections  and  reviews. 

This  module  focuses  on  unit-lcvd  analysis  and  test¬ 
ing  techniques;  integration  and  systems  testing  are 
outside  our  scope.  What  constitutes  a  “unit”  has 
been  left  imprecise — it  may  be  as  little  as  a  single 
statement  or  as  much  as  a  set  of  coupled  subroutines. 
The  essential  characteristic  of  a  unit  is  that  it  can 
meaningfully  be  treated  as  a  whole. 

Because  testing  is  a  form  of  verification,  it  cannot  be 
performed  in  the  absence  of  requirements.  Included 
in  requirements  are  not  only  written  specifications, 
standards,  and  the  like,  but  tdso  implicit  or  unwritten 
understandings  of  what  the  software  should  do. 

Analysis  techniques  are  classified  here  according  the 
kinds  of  software  characteristics  they  discover.  Soft¬ 
ware  characteristics  ate  described  as  reflecting  dif¬ 
ferent  views  of  the  software:  textual,  syntactic,  con¬ 
trol  flow,  data  flow,  computation  flow,  or  functional. 
By  helping  to  discover  software  characteristics,  anal¬ 
ysis  techniques  play  a  part  in  many  verification  tech¬ 
niques,  including  testing. 

Three  major  classes  of  testing  ate  discussed — 
specification-oriented,  implementation-oriented,  and 
error-oriented — as  well  as  some  hybrid  approaches. 
Specification-oriented  testing  ensures  that  specified 
major  features  of  the  software  are  covered. 
Implementation-oriented  testing  ensures  that  major 
characteristics  of  the  code  are  covered.  Error- 
oriented  testing  ensures  that  the  range  of  typical  er¬ 
rors  is  coveted.  The  benefits  of  using  techniques 
from  different  classes  are  complementary,  and  no 
single  technique  is  comprehensive. 

Assessment  of  unit  analysis  and  testing  techniques 
can  be  theoretical  or  empirical.  This  module  pre¬ 
sents  both  of  these  forms  of  assessment,  and  dis¬ 
cusses  criteria  for  selecting  methods  and  controlling 
the  verification  process. 

Management  of  unit  analysis  and  testing  should  be 
systematic.  It  proceeds  in  two  stages.  First,  tech¬ 
niques  appropriate  to  the  project  must  be  selected. 
Then  these  techniques  must  be  systematically  ap¬ 
plied. 


Objectives 


The  following  is  a  list  of  possible  educational  objec¬ 
tives  based  upon  the  material  in  this  module.  Objec¬ 
tives  for  any  particular  unit  of  instruction  may  be 
drawn  from  these  or  related  objectives,  as  may  be 
appropriate  to  audience  and  circumstances. 

Knowledge 

•  Define  the  basic  terminology  of  analysis  and 
testing  (particulady  those  terms  found  in  the 
glossary  on  page  18  or  italicized  in  the  text). 

•  State  the  theoretical  and  computational 
limitations  of  analysis  and  testing. 

•  State  the  strengths  and  weaknesses  of  several 
analysis  and  testing  techniques. 

•  Identify  six  program  views. 

Comprehension 

•  Explain  the  complementary  nature  of 
specification-oriented,  implementation- 
oriented,  and  error-oriented  testing  techniques. 

•  Describe  how  the  choice  of  analysis  and  testing 
criteria  affects  the  selection  and  evaluation  of 
test  data. 

•  Explain  the  role  of  error  collection  as  a 
feedback  and  control  mechanism. 

•  Explain  the  program  view  taken  by  a  given 
testing  techiiique. 

Application 

•  Test  a  software  unit  using  specification- 
oriented,  implementation-oriented  and  error- 
oriented  techniques. 

•  Use  configuration  management  to  control  the 
process  of  unit  analysis  and  testing. 

Analysis 

•  Determine  the  unit  analysis  and  testing 
techniques  applicable  to  a  project,  based  upon 
the  verification  goals,  the  nature  of  the  product, 
and  the  nature  of  the  testing  environment. 

Synthesis 

•  Write  a  test  plan  tailored  to  accommodate 
project  constraints. 

•  Design  software  tools  to  support 
implementation-oriented  analysis  and  testing 
techniques. 

Evaluation 

•  Evaluate  the  potential  usefulness  of  new  unit 
analysis  or  testing  techniques  proposed  in  the 
literature. 
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Prerequisite  Knowledge 


The  student  of  unit  analysis  and  testing  should,  of 
course,  have  a  solid  background  in  programming. 
Some  mathematical  sophistication  is  required,  in¬ 
cluding  a  woiking  knowledge  of  logic,  relations,  and 
functions.  Beyond  this,  necessary  background  is 
dictated  by  the  topics  to  be  covered.  Some  of  the 
specification-oriented  techniques  require  that  the 
student  be  able  to  read  algebraic,  axiomatic,  and 
functional  specifications  of  software  modules.  The 
implementation-oriented  component  requires  knowl¬ 
edge  of  BNF  grammars  and  graphs.  If  stmctural 
analysis  tools  are  to  be  built,  the  student  needs  to 
have  knowledge  of  parsing  technology,  parse  trees, 
and  grai^i  algorithms  at  the  level  of  an  introductory 
compiler  construction  course.  To  understand  the 
fundamental  limitations  of  testing,  the  student 
should  be  familiar  with  the  halting  problem  and  re¬ 
duction  proofs.  If  underlying  foundations  of  statis¬ 
tical  testing  are  to  be  explored  in  depth,  then  a  full 
year  of  statistics  is  a  prerequisite.  Effective  use  of 
the  statistical  models  requires  one  semester  of  statis¬ 
tics. 
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Module  Content 


Outline _ 

I.  Preliminaries 

1.  Concepts  and  terminology 

2.  Adequacy  of  testing 

3.  Limitations  of  testing 

4.  Organization  of  this  module 

II.  Program  Analysis  Techniques 

1.  From  a  textual  view 

2.  From  a  syntactic  view 

3.  From  a  control  flow  view 

4.  From  a  data  flow  view 

5.  From  a  computation  flow  view 

6.  From  a  functional  view 

III.  Program  Testing  Techniques 

1.  Specification-oriented  testing 

a.  Testing  independent  of  the  specification 
technique 

(i)  Testing  based  on  the  interface 

(1)  Input  domain  testing 

(2)  Equivalence  partitioning 

(3)  Syntax  checking 

(ii)  Testing  based  on  the  fimction  to  be 
computed 

(1)  Special  value  testing 

(2)  Output  domain  coverage 

b.  Testing  dependent  on  the  specification 
technique 

(i)  Algebraic 

(ii)  Axiomatic 

(iii)  State  machines 

(iv)  Decision  tables 

2.  Implementation-oriented  testing 

a.  Structure-oriented  testing 

(i)  Statement  testing 

(ii)  Branch  testing 

(iii)  Data  coverage  testing 

b.  Infection-oriented  testing 

(i)  Conditional  testing 

(ii)  Expression  testing 

(iii)  Domain  testing 


(iv)  Perturbation  testing 

(v)  Fault  sensitivity  testing 

c.  Propagation-oriented  testing 

(i)  Path  testing 

(ii)  Compiler-based  testing 

(iii)  Data  flow  testing 

(iv)  Mutation  testing 

3.  Error-oriented  testing 

a.  Error-based  testing 

b.  Fault-based  testing 

c.  Probable  correctness 

4.  Hybrid  Testing  Techniques 

IV.  Evaluating  Unit  Analysis  and  Testing 
Techniques 

1.  Theoretical  evaluation 

2.  Empirical  evaluation 

V.  Managerial  Aspects  of  Unit  Analysis  and  Testing 

1.  Selecting  techniques 

2.  Goals 

3.  Nature  of  the  product 

a.  Data  processing 

b.  Scientific  computation 

c.  Expert  systems 

d.  Embedded  and  real-time  systems 

4.  Nature  of  the  testing  environment 

a.  Available  resources 

b.  Personnel 

c.  Project  constraints 

5.  Control 

a.  Configuration  control 

b.  Conducting  tests 


Annotated  Outline _ 

I.  Preliminaries 

The  analysis  and  testing  techniques  with  which  this 
curriculum  module  deals  are  numerous  and  diverse. 
Unfortunately,  there  is  neitha*  a  widely  accepted 
taxonomy  of  them  nor  a  standard  model  from  which  a 
taxonomy  might  be  derived.  Some  of  the  classifica- 
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tions  seen  in  the  literature  (the  often-made  distinction 
between  static  analysis  and  dynamic  analysis,  for 
example)  seem  to  us  insufficiently  useful  for  making 
sense  of  this  material.  In  this  section,  therefore,  we 
present  a  framework  for  understanding  unit  analysis 
and  testing.  This  framework  not  only  provides  organi¬ 
zation  for  what  follows,  but  also,  we  hope,  provides 
some  insight  into  the  techniques  of  interest  and  into 
why  their  application  is  complex. 

We  begin  by  defming  terms  that  will  be  used  through¬ 
out  the  module.  Then  we  examine  the  complementary 
benefits  of  different  testing  techniques,  followed  by  the 
inherent  limitations  of  testing.  We  conclude  the  sec¬ 
tion  with  an  overview  of  the  remaining  material. 

1.  Concepts  and  terminology 

As  is  often  true  in  an  evolving  field,  terminology 
used  to  describe  program  testing  is  far  from  settled. 
Although  most  experts  would  probably  agree  on 
designating  certain  core  activities  as  “testing,”  the 
boundary  between  what  is  and  is  not  testing  is  not  so 
easily  agreed  upon. 

We  believe  that  precise  definitions  are  important  and 
serve  to  sharpen  one’s  understanding.  We  further 
believe  that  testing,  in  the  narrowest  sense  of  execut¬ 
ing  a  software  unit  on  selected  data,  has  analogues  in 
other  software  engineering  activities — in  the  con¬ 
duct  of  software  inspections,  for  example — that 
might  reasonably  be  cidled  by  that  name  if  one  were 
willing  to  expand  its  meaning  sufficiently.  Such  a 
generalization  is  tempting,  but  it  demands  the 
thorough  exploration  of  a  large  gray  area  through 
which  the  resulting  boundary  of  definition  would 
have  to  pass.  We  have  chosen  to  resist  this  tempta¬ 
tion,  not  only  because  of  the  inherent  difficult  it  en¬ 
tails,  but  also  because  the  generalization  is  likely  to 
seem  more  useful  than  it  actually  is.  We  therefore 
adopt  a  narrow  definition  of  testing,  one  we  think  is 
precise  and  useful,  and  which  reflects  the  common 
understanding  of  the  term. 

The  standards  for  software  test  documentation 
[IEEE83]  and  fw  software  unit  testing  (IEEE87],  as 
well  as  the  standard  glossary  of  software  engineer¬ 
ing  terms  (IEEE90],  define  many  testing-related 
terms.  These  definitions  often  clarify  the  meaning 
of  words  used  inconsistently  both  in  engineering 
practice  and  in  the  literature.  We  have  drawn  from 
these  sources  wherever  possible,  though  we  must 
point  out  that  the  standards  are  not  100%  consistent 
with  one  another.  We  have  introduced  our  own 
definitions  where  doing  so  seems  useful.  Funda¬ 
mental  terms  (shown  in  italics)  are  discussed  below 
and  are  collected  in  a  glossary  that  begins  on  page 
18.’  Where  appropriate,  citations  of  sources  are 
shown  in  the  glossary. 

'lulicized  terms  outside  this  introduction  do  not  necessarily  appear  in 
the  glossary. 
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Informally,  testing  is  verification  using  information 
derived  from  execution  of  software.  But  by  itself,  a 
simple  definition  of  testing  fails  adequately  to  con¬ 
vey  either  its  complexity  or  its  subtlety.  Construct¬ 
ing  a  model  to  discuss  testing  can  help  us  gain 
greater  insight  The  figure  on  page  20  presents  such 
a  model,  in  which  rectangles  represent  objects  or 
collections  of  objects,  and  directed  arcs  represent 
relations. 

At  the  top  left  of  the  figure,  we  show  “require¬ 
ments,”  the  collection  of  statements,  diagrams,  un¬ 
derstandings,  and  other  information  that  define  and 
constrain  the  software  to  be  produced.^  As  used 
here,  requirements  include  not  only  the  immediate 
precursor  of  a  software  unit — for  example,  a  high- 
level  design — but  also  other  assertions,  written  or 
not,  that  serve  to  establish  desired  properties  of  the 
software. 

To  the  right  of  requirements  is  “software,”  the  code 
we  wish  to  verify  or  test.  (For  purposes  of  this  mod¬ 
ule,  we  may  prefer  to  think  of  this  box  as  being 
labeled  “software  unit”)  The  requirements  are  in¬ 
tended  to  (correctly)  specify  the  software,  to  define 
what  is  and  is  not  an  acceptable  implementation; 
verification  seeks  to  determine  whether  the  intended 
relationship  is  actually  achieved.  In  practice,  of 
course,  either  the  requirements  or  the  software  may 
be  defective. 

Unfortunately,  software  cannot  be  directly  measured 
against  requirements.  Instead,  it  is  necessary  to  de¬ 
termine  software  characteristics  through  some  proc¬ 
ess  of  analysis  as  one  important  step  in  verification. 
A  characteristic  is  a  trait,  quality,  or  property  of  the 
software,  whether  intended  or  not.  Both  the  number 
of  vowels  in  the  source  code  and  the  response  of  the 
software  when  executed  on  a  given  input  value  are 
software  characteristics.  The  former  charKteristic  is 
determined  through  static  analysis  (i.e.,  without  ex¬ 
ecuting  the  software)  and  the  latter  is  usually  deter¬ 
mined  through  dynamic  analysis  (i.e.,  by  some  proc¬ 
ess  that  does  involves  execution).  “Software 
characteristics"  are  shown  in  the  lower  right  of  the 
figure.  Through  analysis,  the  software  in  question  is 
inferred  to  po.ssess  some  set  of  software  character¬ 
istics.  Note  that  we  do  not  say  “possesses,”  as  our 
analysis  can  be  faulty — our  argument  that  a  loop  al¬ 
ways  terminates  contains  a  flaw — or  it  can  lead  to  an 
insufficiently  qualified  result — our  observation  that 
the  correct  output  is  produced  in  response  to  a  par¬ 
ticular  input  does  not  necessarily  mean  that  the  soft¬ 
ware  always  does  so,  as  its  output  may  depend  on  an 
internal  state  affected  by  earlier  input  or  by  some 
unrecognized  “input,”  such  as  time  of  day. 


^We  will  not  aueinpt  to  distinguish  between  the  form  of  requirements 
and  their  abstract  semantics.  Because  our  definition  of  requirements 
includes  undocumented  expectations,  sometimes  there  is  no  form  we  can 
point  to! 
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In  an  ideal  world,  perhaps,  verification  would  in¬ 
volve  matching  identified  software  characteristics  to 
correct,  written  requirements.  So  simple  a  process  is 
almost  never  appropriate,  however:  not  everything 
gets  written  down;  requirements  can  be  incomplete, 
overspecified,  or  contradictory;  stated  requirements 
can  te  too  complex  to  test  direcUy;  particularly 
when  implied  and  commonsense  but  unstated  re¬ 
quirements  are  accounted  for,  there  may  simply  be 
too  many  properties  of  the  software  to  verify  in 
practice.  It  is  therefore  necessary  to  identify  some 
set  of  software  features  against  which  verification 
takes  place.  The  box  in  the  lower  left  of  the  figure, 
therefore,  represents  such  a  collection  of  software 
characteristics  specified  or  implied  by  the  require¬ 
ments.  This  collection,  for  purposes  of  verification, 
forms  a  (possibly  inadequate)  representation  of  what 
the  software  is  supposed  to  be  and  to  do. 

Whereas  verification  of  a  piece  of  software  attempts 
to  determine  if  that  software  actually  implements  the 
requirements  (i.e.,  is  within  the  class  of  acceptable 
implementations  specified  by  the  requirements),  the 
task  is  actually  carried  out  by  attempting  to  show 
that  the  inferred  software  characterisdcs  are  suf¬ 
ficient  to  demonstrate  that  the  required  features  are 
present.  Of  course,  like  the  reasoning  by  which  soft¬ 
ware  features  and  software  characteristics  are  de¬ 
rived,  this  process,  too,  is  subject  to  error. 

If  the  verification  process  relies  on  dynamic  analysis 
to  infer  software  characteristics  (i.e.,  on  analysis  in¬ 
volving  executing  the  code),  we  say  the  verification 
is  a  form  of  testing}  Different  methods  of  testing 
appeal  to  the  nature  of  requirements,  to  the  code 
itself,  or  to  insights  into  how  errors  in  the  devel¬ 
opment  process  manifest  themselves  in  requirements 
or  software,  in  order  to  suggest  test  data  and  test 
procedures.  Test  procedures  include  those  for  pro¬ 
gram  execution,  data  collection,  and  assessment  of 
the  results.  Program  execution  involves  the  activi¬ 
ties  of  selecting  test  data  and  determining  the  ex¬ 
pected  output,  constructing  an  environment  for  ex¬ 
ecuting  the  software  with  the  selected  data,  and  per¬ 
forming  the  actual  execution.  Information  to  be  col¬ 
lected  from  the  execution  may  be  obtained  in  a  very 
straightforward  manner,  such  as  from  observation  of 
a  screen  display,  or  it  may  require  instrumenting  the 
software  with  probes  that  reveal  runtime  behavior. 
Assessing  the  results  of  execution  involves  inferring 
software  characteristics  from  the  collected  data  and 
comparing  those  characteristics  with  the  required 
features  to  see  if  the  presence  of  the  required  fea¬ 
tures  is  indicated  by  the  empirical  evidence. 


^Compare  |IEEE90|.  In  [IEEE83],  testing  is  more  broadly  defined  to 
include  both  static  and  dynamic  verification  techniques.  Glossary, 
page  18.) 


2.  Adequacy  of  testing 

As  should  be  clear  fiom  the  foregoing  discussion, 
verification  of  software  by  testing  or  other  means,  is 
quite  indirect.  Required  feabires  and  software  char¬ 
acteristics  must  be  derived  and  compared,  and  an 
argument  may  need  to  be  made  that  establishing  the 
presence  of  the  features  should  be  accepted  as  in¬ 
dicating  that  the  software  indeed  satisfies  the  re¬ 
quirements.  There  are  many  opportunities  for 
making  erroneous  inferences,  yet  resource  limita¬ 
tions  invariably  dictate  making  inferences  that  are 
probably,  though  not  necessarily  correct'* 

To  strengthen  the  conclusions  that  can  be  drawn 
from  testing,  it  is  necessary  to  judiciously  constrain 
the  verification  process.  Conditions  that  are  re¬ 
quired  to  be  satisfied  during  testing  are  called 
^equacy  criteria  [Weyuker86].  For  example,  test¬ 
ing  may  be  considered  inadequate  if  the  test  data  do 
not  include  boundary  cases  specified  by  the  require¬ 
ments,  do  not  cause  execution  of  every  line  of  code, 
or  do  not  cause  the  software  to  deal  with  error-prone 
situations.  The  intent  in  establishing  these  criteria  is 
to  improve  the  quality  of  the  testing.  As  such,  ade¬ 
quacy  criteria  serve  a  purpose  somewhat  akin  to 
software  development  standards,  by  requiring  ad¬ 
herence  to  methods  that  have  previously  proved  suc¬ 
cessful. 

When  considering  the  quality  of  testing,  it  is  cru¬ 
cially  important  not  to  confuse  it  with  the  quality  of 
the  software  being  tested.  This  confusion  can  be 
seen  when  the  goodness  of  testing  is  measured  by 
characteristics  of  the  program  derived  from  testing, 
e.g.,  number  of  failures  found,  number  of  faults 
found,  mean-time-to-failure,  etc.  These  properties 
presume  quality  testing,  rather  than  define  it.  Even 
correct  software  can  be  poorly  tested.  Useful  ade¬ 
quacy  criteria  seek  to  improve  the  likelihood  of  find¬ 
ing /au/rr  in  the  software,  if  they  exist,  regardless  of 
the  quality  of  the  software  being  tested. 

Adequacy  criteria  act  as  both  specifier  and  judge:  as 
specifier  by  indicating  the  constraints  that  must  be 
satisfied  by  the  testing,  and  as  judge  by  indicating 
deficiencies  in  a  particular  test.  Adequacy  criteria 
for  testing  are  generally  expressed  by  stating  some 
required  coverage  the  test  data  should  achieve.  De¬ 
sirable  coverages  include  the  required  features,  the 
software  structure,  or  the  potenti^  errors  that  might 
occur  in  the  life  cycle.  These  coverages  are  dis¬ 
cussed  in  depth  below. 

No  one  testing  technique  is  so  clearly  superior  to 


*1/  a  software  unit  executes  properly  with  integer  inputs  1-43  and 
45-100,  a  reasonable  penon  testing  ihe  unit  and  having  no  cause  to 
believe  the  integer  44  is  in  any  way  special,  is  likely  to  conclude  the  unit 
performs  satisfactorily  for  inputs  1-100. 
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others  that  its  exclusive  use  can  be  justified.  Testing 
techniques  are  best  seen  as  complementary  rather 
than  competing  forms  of  verification;  different  tech¬ 
niques  tend  to  catch  different  faults.  This  fact  is 
reflected  in  the  way  testing  techniques  themselves 
are  classifled  in  this  module:  specification-oriented, 
implementation-oriented,  and  error-oriented.  Tech¬ 
niques  in  each  category  focus  on  particular  charac¬ 
teristics  of  the  software  system,  leaving  them  sus¬ 
ceptible  to  failing  to  uncover  particular  kinds  of 
faults.  Refinements  introduced  by  the  progiammer 
for  efficiency  may  not  be  executed  in  a 
specification-oriented  test,  for  example,  and  a  case 
required  by  the  specification  but  omitted  from  the 
code  may  not  tested  in  an  implementation- 
oriented  test  If  the  errors  sought  in  an  error- 
oriented  test  are  too  limited,  faults  detectable  by 
other  methods  may  be  missed. 

Testing  is  a  principal  activity  used  to  assess  software 
quality.  Such  assessment  is  a  subjective  judgment 
as  to  the  suitability  of  a  given  technique  or  product 
for  a  particular  purpose.  Evaluating  software  re¬ 
quires  amassing  as  much  information  as  possible 
about  its  quality.  Information  produced  by  testing  is 
a  valuable  component  in  that  evaluation,  since  all 
other  forms  of  verification  are  further  removed  from 
the  operational  environment  of  the  software.  In¬ 
ferences  derived  from  particular  test  executions  must 
be  tempered  by  considering  the  environmental  fac¬ 
tors  (e.g.,  implicit  inputs,  compilers,  operating  sys¬ 
tems,  hardware,  etc.)  that  influence  the  program’s 
behaviOT,  however.  Test  results  are  therefore  sus¬ 
ceptible  to  misinterpretation  in  a  manner  similar  to 
other  verification  techniques. 

3.  Limitations  of  testing 

Some  problems  cannot  be  solved  on  a  computer  be¬ 
cause  they  are  either  intractable  or  undecidable.  An 
intractable  problem  is  one  whose  best  known  solu¬ 
tion  requires  inordinate  resources.  An  undecidable 
problem  is  one  for  which  no  algorithmic  solution  is 
possible.  There  are  many  such  intractable  and  un¬ 
decidable  problems  associated  with  analysis  and 
testing.  In  general,  programs  cannot  be  exhaustively 
tested  (tested  for  each  input)  because  to  do  so  is  both 
intractable  and  undecidable.  Huang  shows  that  to 
test  exhaustively  a  program  that  reads  two  32-bit 
integers  would  take  on  the  order  of  SO  billion  years 
[Huang75]!  Even  if  the  input  space  is  smaller,  on  the 
very  first  input  it  may  be  the  case  that  the  program 
does  not  halt  within  a  reasonable  time.  It  may  even 
be  the  case  that  it  is  obvious  the  correct  output  will 
be  produced  if  the  program  ever  does  halt.  Exhaus¬ 
tive  testing  can  only  be  completed,  therefore,  if  all 
non-halting  cases  can  be  detected  and  eliminated. 
The  problem  of  effecting  such  detection,  however,  is 
undecidable. 

Another  limitation  on  the  power  of  testing  is  its  reli¬ 


ance  on  an  oracle.  An  oracle  is  a  mechanism  that 
judges  whether  or  not  a  given  output  is  correct  fcx'  a 
given  input  In  some  cases,  no  oracle  may  be  avail¬ 
able,  e.g.,  when  the  program  is  written  to  compute 
an  answer  that  cannot,  in  practice,  be  computed  by 
hand.  Imperfect  oracles  may  be  available,  but  their 
use  is  ri^y.  The  absence  of  an  oracle,  or  the 
presence  of  an  imperfect  oracle  weakens  significant¬ 
ly  any  conclusions  drawn  from  testing. 

4.  Organization  of  this  module 

The  remainder  of  this  module  is  divided  into  four 
sections,  discussing  analysis  techniques,  testing 
techniques,  methods  fra*  evaluating  testing  tech¬ 
niques,  and  methods  for  managing  testing.  Program 
an^ysis  is  treated  first,  as  each  testing  technique 
uses  one  or  more  methods  of  analysis.  We  classify 
analysis  methods  according  to  the  view  of  the  soft¬ 
ware  implicit  in  each  technique.  Our  primary  classi¬ 
fication  of  testing  techniques  is  according  to  the  ade¬ 
quacy  criteria  they  seek  to  satisfy.  Secondary  clas¬ 
sifications  refine  the  primary  taxonomy  as  appro¬ 
priate.  Once  a  thorough  background  is  laid,  evalu¬ 
ation  and  managerial  issues  are  addressed. 

II.  Program  Analysis  Techniques 

Any  technique  that  seeks  to  determine  software  charac¬ 
teristics  is  a  form  of  program  analysis.  Software  char¬ 
acteristics  are  essential  in  development,  debugging, 
documentation,  verification,  evaluation,  certification, 
and  maintenance.  This  section  discusses  analysis  tech¬ 
niques  that  support  verification  in  general,  and  testing 
in  particular.  In  addition,  analysis  can  help  decide 
where  to  focus  testing. 

Analysis  is  employed  in  all  stages  of  testing,  including 
test  data  selection,  program  execution,  data  collection, 
and  assessment  of  the  results.  Test  data  can  be  selected 
based  upon  consideration  of  various  information 
sources,  including  the  specification,  the  implementa¬ 
tion,  and  potential  errors  and  faults.  Collecting  compu¬ 
tational  information  may  require  analysis  of  the  pro¬ 
gram  text  and  subsequent  instrumentation  of  the  pro¬ 
gram.  Program  execution  itself  is  a  form  of  analysis. 
Establishing  an  oracle  for  verification  may  require  ad¬ 
ditional  analysis. 

The  analysis  techniques  discussed  here  are  classified 
according  to  the  view  they  take  of  the  program.  Each 
view  empihasizes  different  aspects  of  the  program  and 
enables  the  determination  of  different  program  charac¬ 
teristics.  Several  predominant  views  are  described  be¬ 
low,  along  with  the  analysis  techniques  they  support. 
Some  analysis  techniques  employ  more  than  one  view. 
The  views  below  are  roughly  ordered  by  increasing 
information  content 

1.  From  a  textual  view 

From  a  textual  view,  a  program  is  treated  as  a  se¬ 
quence  of  characters  or  tokens.  Many  primitive 
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metrics,  such  as  program  length  and  frequency  of 
occurrence  of  identifiers,  take  this  view.  Text  edi¬ 
tors  manipulate  a  program  as  a  textual  object,  as  do 
scanners,  line  counters,  etc.  Coding  guidelines  are 
frequently  expressed  from  this  viewpoint  Simple 
prettyprinters  can  be  based  on  this  view. 

2.  From  a  syntactic  view 

A  program  may  be  viewed  as  a  hierarchy  of  syntac¬ 
tic  elements  determined  by  the  programming 
language’s  grammar.  Programs  decompose  into 
subprograms  that  decompose  into  statement  groups 
that  decompose  into  statements,  etc.,  until  the  token 
level  is  reached.  This  syntactic  view  can  be  ob¬ 
tained  from  a  textual  view  (e.g.,  by  a  parser)  or  may 
be  constructed  directly  (e.g.,  by  a  syntax-directed 
editor).  Derivable  program  characteristics  include 
statement  counts,  i^ntifier  cross  references,  pro¬ 
gram  call  graphs  (what  procedure  calls  what 
procedure),  declared  and  undeclared  variables,  fre¬ 
quency  of  variable  use,  and  so  on.  Many  sophis¬ 
ticated  program  metrics  are  based  on  this  view. 

The  syntactic  view  supports  program  instrumenta¬ 
tion,  in  which  source  or  object  code  is  modified  to 
divulge  its  internal  workings  as  it  executes.  During 
such  execution,  a  variety  of  program  characteristics 
can  be  determined,  such  as  what  statements  and 
branches  are  executed.  Execution  counts  or  even 
complete  trace  information  indicating  the  value 
computed  by  each  expression  can  be  generated. 
Probert  presents  algorithms  for  such  instrumentation 
(using  control  flow  graphs — see  next  section) 
[Probert82],  and  Beizer  discusses  the  levels  of  in¬ 
strumentation  and  their  resulting  impact  [Beizer90]. 
Instrumented  code  will  necessarily  be  larger  and 
slower,  so  that  executing  it  may  mask  timing,  size, 
and  position-related  faults. 

3.  From  a  control  flow  view 

A  program’s  control  flow  relation  relates  program 
elements  according  to  their  execution  order.  A  pro¬ 
gram  element  is  usually  a  condition,  a  single  state¬ 
ment,  or  a  block  of  statements.  If  element  B  can  be 
executed  immediately  after  element  A,  then  (A,  B)  is 
in  the  control  flow  relation  of  the  program.  Succes¬ 
sor  execution  is  determined  independent  of  the  com¬ 
putation  performed  at  A.  Thus,  if  A  refers  to  the 
condition  and  B  refers  to  the  write  statement  in 

if  x<x  then  write(x) 

then  (A,  B)  is  still  in  the  control  flow  relation  despite 
the  fact  that  A  is  always  false. 

The  graph  corresponding  to  the  control  flow  relation 
is  called  a  control  flow  graph  (Hecht77].  Each  node 
of  the  graph  corresponds  to  a  program  element;  a 
directed  arc  between  two  nodes  indicates  the  cor¬ 
responding  two  elements  form  an  ordered  pair  in  the 
control  flow  relation.  A  path  through  a  control  flow 


graph  corresponds  to  a  potentially  executable  se¬ 
quence  of  program  elements.  To  execute  a  path  is  to 
execute  the  corresponding  sequence  of  program  ele¬ 
ments.  If  a  program  input  exists  that  causes  execu¬ 
tion  of  a  path,  that  path  is  called  feasible',  otherwise 
it  is  called  infeasible.  Programs  with  loc^s  usually 
have  infmitely  many  paths;  even  without  loops,  a 
program  may  have  intractably  many  paths  to  ana¬ 
lyze. 

Control  flow  graphs  have  no  labels  or  other  annota¬ 
tions,  distinguishing  them  from  flowcharts,  which 
capture  additional  program  semantics.  Control  flow 
graphs  are  generally  produced  from  a  syntactic  view 
of  a  program  for  more  efficient  processing  of  control 
flow  information.  There  are  many  program  metrics 
based  upon  the  control  flow  view  of  „  program. 

4.  From  a  data  flow  view 

The  data  flow  relation  determined  by  a  program  re¬ 
lates  program  elements  according  to  their  data  ac¬ 
cess  behavior.  If  element  B  uses  (refers  to)  a  data 
object  that  was  potentially  defined  at  element  A, 
then  (A,  B)  is  in  the  data  flow  relation  of  the  pro¬ 
gram.  A  data  flow  graph  [Fosdick76,  Hecht77]  is  a 
directed,  labeled  graph  corresponding  to  the  data 
flow  relation,  in  which  nodes  correspond  to  program 
elements  and  directed  arcs  connect  A  to  B  with  label 
V  if  (A,  B)  is  in  the  data  flow  relation  due  to  a  defmi- 
tion  of  V  at  A  and  a  use  of  v  at  B.  Data  flow  graphs 
can  be  produced  from  the  syntactic  view  for  more 
efficient  processing  of  data  flow  information. 

A  program  can  be  represented  as  a  flow  graph  an¬ 
notated  with  information  about  variable  definitions, 
references,  and  undeflnitions.  From  this  represen¬ 
tation,  information  about  data  flow  can  be  deduced 
for  use  in  code  optimization,  anomaly  detection,  and 
test  data  generation  [Hecht77,  MuchnickSt]. 

Data  flow  anomalies  are  flow  conditions  that 
deserve  further  investigation,  as  they  may  indicate 
problems.  Examples  include;  defining  a  variable 
twice  with  no  intervening  reference,  referencing  a 
variable  that  is  undefined,  and  undefining  a  variable 
that  has  not  been  referenced  since  its  last  definition. 
Algorithms  for  detecting  these  anomalies  are  given 
in  [Fosdick76]  and  [Osterweil76],  and  are  refined  and 
corrected  in  (Jachner84]. 

A  program  slice  results  from  eliminating  ail  state¬ 
ments  that  cannot  affect  the  computation  of  an  ex¬ 
pression  at  a  specified  location  [Weiser84}.  Korel 
adapts  slicing  to  testing  and  debugging  in  [Korel88a], 
[Korel88b],  and  [Korel90b].  His  method  employs  a 
variation  on  a  data  flow  graph  called  a  program  de¬ 
pendence  graph  [Korel87]. 

5.  From  a  computation  flow  view 

A  program  can  be  viewed  as  a  finite  representation 
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of  a  (potentially  infinite)  set  of  computations.  A 
computation  is  a  trace  of  the  data  states^  produced 
by  a  program  when  executing  a  particular  input.  A 
thorough  analysis  of  the  computation  flow  of  a  pro¬ 
gram  induced  by  an  execution  may  serve  to  estimate 
the  number  of  faults  remaining  in  the  code,  the 
strength  of  the  test  data  to  catch  faults,  and  the  abil¬ 
ity  of  the  program  to  hide  faults. 

Fault  seeding  is  a  statistical  method  used  to  assess 
the  number  and  nature  of  the  faults  remaining  in  a 
program.  A  reprint  of  Harlan  Mills’  original 
proposal  for  this  technique  (where  he  calls  it  error 
seeding)  appears  in  (Mills83].  First,  faults  are  seeded 
into  a  program.  Then  the  program  is  tested,  and  the 
number  of  faults  discovered  is  used  to  estimate  the 
number  of  faults  yet  undiscovered.  A  difficulty  with 
this  technique  is  that  the  faults  seeded  must  be  repre¬ 
sentative  of  the  yet-undiscovered  faults  in  the  pro¬ 
gram. 

Mutation  analysis  uses  fault  seeding  to  investigate 
properties  of  test  data  [Hamlet77a,  Hamlet77b, 
DeMillo78a,  D«Millo88].  Programs  with  seeded 
faults  are  called  mutants.  Mutants  are  executed  to 
determine  whether  or  not  they  behave  differently 
from  the  original  program.  Mutants  that  behave  dif¬ 
ferently  are  said  to  have  been  killed  by  the  test.  The 
product  of  mutation  analysis  is  a  measure  of  how 
well  test  data  kill  mutants.  Mutants  are  produced  by 
applying  a  mutation  operator.  Such  an  operator 
changes  a  single  expression  in  the  program  to  anoth¬ 
er  expression,  selected  from  a  finite  class  of  expres¬ 
sions.  For  example,  a  constant  might  be  incre¬ 
mented  by  one,  decremented  by  one,  or  replaced  by 
zero,  yielding  one  of  three  mutants.  Applying  the 
mutation  operators  at  each  point  in  a  program  where 
they  are  applicable  forms  a  finite,  albeit  laige,  set  of 
mutants. 

Three  conditions  necessary  and  sufficient  for  a  fault 
to  cause  a  program  failure  are  execution,  infection, 
and  propagation  [Morell90,  Voas91].  The  fault  loca¬ 
tion  must  be  executed,  the  resulting  data  state  must 
be  infected  with  an  erroneous  value,  and  the  suc¬ 
ceeding  computation  must  propagate  the  infection 
through  erroneous  data  states,  producing  a  failure. 
[Richardson88]  and  [Morell90]  discuss  necessary 
conditions  for  infection  and  propagation  to  occur. 

Sensitivity  analysis  (Foste.SO,  Voas91]  investigates 
the  three  conditions  required  for  failure,  with  partic¬ 
ular  focus  on  infection  and  propagation  of  errors. 
Irfection  analysis  employs  mutation  analysis  to  de¬ 
termine  the  probability  of  a  data  state’s  being  in¬ 
fected  after  a  potentially  faulty  statement  is  ex¬ 
ecuted.  Propagation  analysis  mutates  the  data  state 


^The  data  stale  of  a  program  is  ihe  nrupping  of  program  variables 
(including  temporaries  and  the  program  counter)  to  values. 


to  determine  the  probability  that  an  infected  data 
state  will  cause  a  program  failure.  [Voas91]  gives 
means  of  estimating  the  probability  that  execution, 
infection,  and  propagation  will  occur  for  faults  and 
data  states  from  particular  classes.  [Morell88]  and 
[Morell90]  use  symbolic  execution  (see  below)  to  an¬ 
alyze  potential  error  flow.  Symbolic  faults  are  intro¬ 
duced  into  the  program,  and  the  program  is  symboli¬ 
cally  executed.  The  symbolic  output  captures  the  cf 
feet  the  fault  would  have  on  the  program  compu 
tation. 

6.  From  a  functional  view 

Programs  may  be  viewed  as  functions  by  consiJ 
ering  them  as  denotations  for  a  set  of  ordered  pair^ 
(x,  y),  where  y  is  the  output  produced  by  the  prngrun. 
that  halts  on  input  x  [Mills75].  Executing  the  pm 
gram  for  input  x  and  observing  its  output  (if  an>  i  o 
an  analysis  technique  that  provides  direct  evidence 
of  the  program’s  function  on  the  given  input. 

Symbolic  analysis  seeks  to  describe  the  funcuon 
computed  by  a  program  in  a  more  general  way.  A 
symbolic  execution  system  accepts  three  inputs:  a 
program  to  be  interpreted,  symbolic  input  for  the 
program,  and  the  path  to  follow.  It  produces  tvc(> 
outputs:  the  symbolic  output  that  describes  the  com 
putation  of  the  selected  path,  and  the  path  condition 
for  that  path  [Hantler76].  The  specification  of  the 
path  can  be  either  interactive  [Clarke76]  or  pre¬ 
selected  [Howdan77,  Howden78bJ.  The  symNiIk 
output  can  be  used  to  prove  the  program  correct  with 
respect  to  its  specification,  and  the  path  condiuon 
can  be  used  for  generating  test  data  to  exercise  the 
desired  path.  Structured  data  types  cause  dif¬ 
ficulties,  however,  since  it  is  sometimes  impossible 
to  deduce  what  component  is  being  modified. 

III.  Program  Testing  Techniques 

Testing  is  verification  that  relies  on  program  execution 
It  includes  all  the  activities  associated  with  test  data 
selection,  program  instrumentation  and  execution,  and 
analysis  of  the  results. 

Since  the  conclusions  of  testing  are  drawn  from 
execution-derived  characteristics,  the  validity  of  the 
conclusions  is  directly  related  to  the  accuracy  with 
which  the  execution  in  the  test  environment  models  an 
execution  in  the  target  environment  Care  must  be 
taken  to  ensure  that  all  environmental  factors  arc  con¬ 
sidered  in  assessing  these  characteristics.  For  example, 
all  implicit  inputs  must  be  considered  (e.g.,  the  system 
clock,  the  state  of  files,  the  load  location  of  the  unit),  as 
well  as  how  representative  of  the  actual  environment 
{e.g.,  same  compiler,  loader,  operating  system,  com¬ 
puter,  input  distribution)  is  the  test  environment. 

The  relationship  of  the  test  environment  to  the  “real” 
execution  environment  is  of  particular  concern  for 
symbolic  execution.  Relying  on  an  interpreter  raises 
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additional  concents  as  to  faithfulness  to  the  program¬ 
ming  language  and  hardware  specifications.  It  is  im¬ 
possible  to  be  dogmatic  about  what  should  be  consid¬ 
ered  a  rei»esentative  execution.  It  is  usually  better  to 
err  on  the  side  of  caution  when  interpreting  the  results 
of  a  given  execution,  however. 

Test  data  selection  may  be  guided  by  several  sources: 
the  specification,  the  implementation,  potential  errors 
in  the  programming  process,  or  some  combination 
thereof.  The  testing  techniques  discussed  below  are 
organized  according  to  these  diverse  sources. 
Specification-oriented  testing  seeks  to  show  that  every 
required  software  feature  is  addressed  by  some  aspect 
of  the  software.  Implementation-oriented  testing  at¬ 
tempts  to  show  that  the  implementation  contains  no 
surprises,  by  showing  that  various  aspect  of  the  code 
can  be  exercised  without  violating  the  requirements. 
Error-oriented  testing  seeks  to  show  that  certain  errors 
in  the  programming  process  have  not  occurred. 

1.  Specification-oriented  testing 

Program  testing  is  specification-oriented  when  test 
data  are  developed  fiom  documents  and  understand¬ 
ings  intended  to  specify  a  module’s  behavior. 
Sources  include,  but  are  not  limited  to,  the  actual 
written  specification  and  the  high-  and  low-level  de¬ 
signs  of  the  code  to  be  tested  [HowdenSOa].  The 
goal  is  to  test  for  the  presence  of  each  (required) 
software  feature,  including  the  input  domains,  the 
output  domains,  categories  of  inputs  that  should 
receive  equivalent  processing,  and  the  processing 
functions  themselves. 

Specification-oriented  testing  seeks  to  show  that 
every  requirement  is  addressed  by  the  software.  An 
unimplemented  requirement  may  be  reflected  in  a 
missing  path  or  missing  code  in  the  software. 
Specification-oriented  testing  assumes  a  functional 
view  of  the  software  and  sometimes  is  called 
functional  or  black-box  testing  [Howden86]. 

a.  Testing  independent  of  the  specification 
technique 

Specifications  detail  the  assumptions  that  may  be 
made  about  a  given  software  unit  They  must 
describe  the  interface  through  which  access  to  the 
unit  is  given,  as  well  as  the  behavior  once  such 
access  is  given.  The  interface  of  a  unit  includes 
the  features  of  its  inputs,  its  outputs,  and  their 
related  value  spaces  (called  domains).  The 
behavior  of  a  module  always  includes  the 
function(s)  to  be  computed  (its  semantics),  and 
sometimes  the  runtime  characteristics,  such  as  its 
space  and  time  complexity.  Specification- 
oriented  testing  derives  test  data  from  aspects  of 
the  specification. 


(i)  Testing  based  on  the  interface 

Testing  based  on  the  interface  of  a  module  se¬ 
lects  test  data  based  on  the  features  of  the  input 
and  ouqiut  domains  of  the  module  and  their 
interrelationships. 

(1)  Input  domain  testing 

In  extremal  testing,  test  data  are  chosen  to 
cover  the  extremes  of  the  input  domain. 
Similarly,  midrange  testing  selects  data  from 
the  interiors  of  domains.  The  motivation  is 
inductive — ^it  is  hqied  that  conclusions 
about  the  entire  input  domain  can  be  drawn 
from  the  behavior  elicited  by  some  represen¬ 
tative  members  of  it  [Myers79].  For  struc¬ 
tured  input  domains,  combinations  of  ex¬ 
tremal  points  for  each  component  are  cho¬ 
sen.  This  procedure  can  generate  a  large 
quantity  of  data,  though  considerations  of 
the  inherent  relationships  among  compo¬ 
nents  can  ameliorate  this  problem  somewhat 
[HowdenSOb]. 

(2)  Equivalence  partitioning 

Specifications  frequently  partition  the  set  of 
all  possible  inputs  into  classes  that  receive 
equivalent  treatment.  Such  partitioning  is 
c^led  equivalence  partitioning  [Myers79]. 
A  result  of  equivalence  partitioning  is  the 
identification  of  a  ftnite  set  of  functions  and 
their  associated  input  and  output  domains. 
For  example,  the  specification 

{(x,y)|  x>0  ID  y=x  &  x<0  y=-x) 

partitions  the  input  into  two  sets,  associated, 
respectively,  with  the  identity  and  negation 
functions.  Input  constraints  and  error  con¬ 
ditions  can  also  result  from  this  partitioning. 
Once  these  partitions  have  been  developed, 
both  extremal  and  midrange  testing  are  ap¬ 
plicable  to  the  resulting  input  domains. 

[Duran81],  [Duran84],  and  [Hamlet90]  com¬ 
pare  equivalence  partitioning  to  random  test¬ 
ing,  on  the  basis  of  statistical  confidence  in 
the  probability  of  failure  after  testing  is 
complete. 

(3)  Syntax  checking 

Every  robust  program  must  parse  its  input 
and  handle  incorrectly  formatted  data.  Veri¬ 
fying  this  feature  is  called  syntax  checking 
[Beizer90].  One  means  of  accomplishing 
this  is  to  execute  the  program  using  a  broad 
spectrum  of  test  data.  By  describing  the 
data  with  a  BNF  grammar,  instances  of  the 
input  language  can  be  generated  using  algo¬ 
rithms  from  automata  theory.  [Duncan81] 
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and  [Bazzichi82]  describe  systems  that  pro¬ 
vide  limited  control  over  the  data  to  be  gen¬ 
erated. 

(ii)  Testing  based  on  the  function  to  be 
computed 

Equivalence  partitioning  results  in  the  identifi¬ 
cation  of  a  fmite  set  of  functions  and  their  asso¬ 
ciated  input  and  output  domains.  Test  data  can 
be  developed  based  on  the  known  character¬ 
istics  of  these  functions.  Consider,  for  ex¬ 
ample,  a  function  to  be  computed  that  has  fixed 
points,  i.e.,  certain  of  its  input  values  are 
mapped  into  themselves  by  the  function.  Test¬ 
ing  the  computation  at  these  fixed  points  is 
possible,  even  in  the  absence  of  a  complete 
specification  [Weyuker82].  Knowledge  of  the 
function  is  essential  in  order  to  ensure  adequate 
coverage  of  the  output  domains. 

(1)  Special  value  testing 

Selecting  test  data  on  the  basis  of  features  of 
the  function  to  be  computed  is  called  special 
value  testing  (Howden80b].  This  procure 
is  particularly  applicable  to  mathematical 
computations.  Properties  of  the  function  to 
be  computed  can  aid  in  selecting  points  that 
will  indicate  the  accuracy  of  the  computed 
solution.  For  example,  the  periodicity  of  the 
sine  function  suggests  use  of  test  data  values 
which  differ  by  multiples  of  2n.  Such  char¬ 
acteristics  are  not  unique  to  mathematical 
computations.  Most  prettyprinters,  for  ex¬ 
ample,  when  applied  to  their  own  output, 
should  reproduce  it  unchanged.  Some  word 
processors  behave  this  way  as  well. 

(2)  Output  domain  coverage 

For  each  function  determined  by  equiv¬ 
alence  partidoning,  there  is  an  associated 
output  domain.  Output  domain  coverage  is 
performed  by  selecdng  points  that  will  cause 
the  extremes  of  each  of  the  output  domains 
to  be  achieved  [Howden80b].  Tbis  ensures 
that  units  have  been  checked  for  maximum 
and  minimum  output  condiuons  and  that  all 
categories  of  error  messages  have,  if  pos¬ 
sible,  been  produced.  In  general,  construct¬ 
ing  such  test  data  requires  knowledge  of  the 
funcdon  to  be  computed  and,  hence,  exper- 
dse  in  the  application  area. 

b.  Testing  dependent  on  the  specification 
technique 

The  specification  technique  employed  can  aid  in 
testing.  An  executable  specification  can  be  used 
as  an  oracle  and,  in  some  cases,  as  a  test  gener¬ 
ator.  Structural  properties  of  a  specification  can 


guide  the  testing  process.  If  the  specification  falls 
within  certain  limited  classes,  properties  of  those 
classes  can  guide  the  selection  of  test  data.  Much 
work  remains  to  be  done  in  this  area  of  testing. 

(i)  Algebraic 

In  algebraic  specification,  properties  of  a  data 
abstraction  are  expressed  by  means  of  axioms 
or  rewrite  rules.  In  one  testing  system, 
DAISTS,  the  consistency  of  an  algebraic  speci¬ 
fication  with  an  implementation  is  checked  by 
testing  [Gannon81].  Each  axiom  is  compiled 
into  a  procedure,  which  is  then  associated  with 
a  set  of  test  points.  A  driver  program  supplies 
each  of  these  points  to  the  procedure  of  its 
respective  axiom.  The  {Hocedure,  in  turn,  in¬ 
dicates  whether  the  axiom  is  satisfied.  Struc¬ 
tural  coverage  of  both  the  implementation  and 
the  specification  is  computed.  [Jaiote89]  dis¬ 
cusses  an  ai^roach  to  generating  test  data  to 
verify  the  completeness  of  an  algebraic  specifi¬ 
cation.  Algebraic  methods  are  applicable  to 
testing  object-oriented  programs. 

(ii)  Axiomatic 

Despite  the  potential  fw  widespread  use  of 
predicate  calculus  as  a  specification  language, 
little  has  been  published  about  deriving  test 
data  from  such  specifications.  [Qourlay83] 
references  work  done  on  the  relationship  be¬ 
tween  predicate  calculus  specifications  and 
path  testing. 

(iii)  State  machines 

Many  programs  can  be  specified  as  state 
machines,  thus  providing  additional  means  of 
selecting  test  data  [Beizer90].  Since  the  equiv¬ 
alence  problem  of  two  fmite  automata  is  d^id- 
able,  testing  can  be  used  to  decide  whether  a 
program  that  simulates  a  fmite  automaton  with 
a  bounded  number  of  nodes  is  equivalent  to  the 
one  specified.  This  result  can  be  used  to  test 
those  features  of  programs  that  can  be  specified 
by  finite  automata,  e.g.,  the  control  flow  of  a 
transaction-processing  system. 

(iv)  Decision  tables 

Decision  tables  are  a  concise  method  of 
representing  an  equivalence  partitioning.  The 
rows  of  a  decision  table  specify  all  the  con¬ 
ditions  that  the  input  may  satisfy.  The  columns 
specify  different  sets  of  actions  that  may  occur. 
Entries  in  the  table  indicate  whether  the  actions 
should  be  performed  if  a  condition  is  satisfied. 
Typical  entries  are  “Yes,”  “No,”  or  “Don’t 
Care.”  Each  row  of  the  table  suggests  signif¬ 
icant  te.st  data.  Cause-^ect  graphs  [Myers79] 
provide  a  systematic  means  of  translating 
English  specifications  into  decision  tables, 
from  which  test  data  can  be  generated. 
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2.  Implementation-oriented  testing 

In  implementation-oriented  program  testing,  test 
data  selection  is  guided  by  information  derived  from 
the  implementation  [Howden75].  The  goal  is  to  en¬ 
sure  that  various  computational  characteristics  of  the 
software  are  adequately  covered.  It  is  hoped  that 
test  data  that  satisfy  these  criteria  have  higher  proba¬ 
bility  of  discovering  faults.  Each  execution  of  a  pro¬ 
gram  executes  a  particular  path.  Hence,  implemen¬ 
tation-oriented  testing  focuses  on  the  following 
questions:  What  computational  characteristics  are 
desirable  to  achieve?  What  paths  for  this  program 
achieve  these  characteristics?  What  test  data  will 
execute  those  paths?  What  are  the  computational 
characteristics  of  the  set  of  paths  executed  by  a 
given  test  set? 

Implementation-oriented  testing  addresses  the  fact 
that  only  the  program  text  reveals  the  detailed  deci¬ 
sions  of  the  programmer.  For  the  sake  of  efficiency, 
a  programmer  might  choose  to  implement  a  special 
case  that  appears  nowhere  in  the  specification.  The 
corresponding  code  will  be  tested  only  by  chance 
using  specification-oriented  testing,  whereas  use  of  a 
structu^  coverage  measure  such  as  statement 
coverage  (see  below)  should  indicate  the  need  for 
test  data  for  this  case. 

Implementation-oriented  testing  schemes  may  be 
classified  acccs’ding  to  two  orthogonal  axes:  error 
orientation  and  program  view,  discussed  earlier  in 
Section  II.  A  testing  scheme’s  error  orientation  is 
the  aspect  of  fault  discovery  that  is  emphasized:  ex¬ 
ecution,  infection,  or  propagation  (see  Section  II.S). 
A  testing  scheme’s  program  view  is  the  program  ab¬ 
straction  source  that  is  used  to  determine  desirable 
computational  characteristics:  control  flow,  data 
flow,  or  computation  flow.  Program  view  empha¬ 
sizes  how  a  particular  strategy  works;  error  orien¬ 
tation  emphasizes  the  motivation  behind  the  strategy 
and  helps  one  to  better  evaluate  claims  made  about 
the  strategy. 

The  subsequent  sections  are  organized  by  error  ori¬ 
entation.  Techniques  that  require  execution  of  par¬ 
ticular  program  elements  are  presented  first,  fol¬ 
lowed  by  those  that  attempt  to  force  infections,  and 
those  that  attempt  to  force  propagation.  It  should  be 
noted  that  infection  and  propagation  techniques  both 
require  execution,  and  some  techniques  emphasize 
all  three  conditions.  Within  each  section,  the  tech¬ 
niques  are  ordered  by  program  view:  control  flow, 
then  data  flow,  then  computation  flow. 

a.  Structure-oriented  testing 

A  testing  technique  is  structure-oriented  if  it 
seeks  test  data  that  cause  various  structural  as¬ 
pects  of  the  program  to  be  exercised.  Assessing 
the  coverage  achieved  may  involve  instrumenting 
the  code  to  keep  track  of  which  parts  of  the  pro¬ 


gram  are  actually  exercised  during  testing.  The 
inexpensive  cost  of  such  instrumentation  has  been 
a  prime  motivation  for  adopting  structure-oriented 
techniques  [Probert82].  Further  motivation  comes 
from  consideration  of  the  consequences  of  releas¬ 
ing  a  product  without  having  executed  all  its  parts 
and  having  the  customer  discover  faults  in  un¬ 
tested  code. 

There  are  three  essential  components  to  be 
covered  in  structure-oriented  testing:  computa¬ 
tions,  branches,  and  data.  These  are  discussed 
below. 

(i)  Statement  testing 

Statement  testing  requires  that  every  statement 
in  the  program  be  executed.  While  it  is  ob¬ 
vious  that  achieving  100%  statement  coverage 
does  not  ensure  a  ccnrect  program,  it  is  equally 
obvious  that  anything  less  means  that  there  is 
code  in  the  program  that  has  never  been  ex¬ 
ecuted! 

(ii)  Branch  testing 

Achieving  1(X)%  statement  coverage  does  not 
ensure  that  each  branch  in  the  program  flow 
graph  has  been  executed.  For  example,  execut¬ 
ing  an  if  ...  then  statement  (no  else)  when  the 
tested  condition  is  true,  tests  only  one  of  two 
branches  in  the  flow  graph.  Branch  testing 
seeks  to  ensure  that  every  branch  has  been  ex¬ 
ecuted  [Huang75,  TaiSO].  Branch  coverage  can 
be  checked  by  probes  inserted  at  points  in  the 
program  that  represent  arcs  from  branch  points 
in  the  flow  graph  [Probort82].  This  instrumen¬ 
tation  suffices  for  statement  coverage  as  well. 

(iii)  Data  coverage  testing 

In  some  programs,  a  portion  of  the  flow  control 
is  determined  by  the  data,  rather  than  by  the 
code.  Knowledge-based  applications,  some  AI 
applications,  and  table-driven  code  are  all  ex¬ 
amples  of  this  phenomenon.  Data  coverage 
testing  seeks  to  ensure  that  various  components 
of  the  data  are  “executed,”  i.e.,  they  are 
referenced  or  modified  by  the  interpreter  as  it 
executes.  Paralleling  statement  testing,  one 
can  ensure  that  each  data  location  is  accessed. 
Furthermore,  in  the  area  of  knowledge  bases, 
data  items  can  be  accessed  in  different  orders, 
so  it  is  important  to  cover  each  of  these  access 
orders.  These  access  sequences  are  analogous 
to  branch  testing. 

b.  Infection-oriented  testing 

A  testing  technique  is  considered  infection- 
oriented  if  it  seeks  to  establish  conditions  suitable 
for  infections  to  arise  at  locations  of  potential 
faults.  This  section  characterizes  several  testing 


12 


SEI-CM-9-2.0 


Unit  Analysis  and  Testing 


techniques  that  require  test  data  to  force  infec¬ 
tions  if  faults  exist. 

(i)  Conditional  testing 

In  conditional  testing,  each  clause  in  every 
condition  is  forced  to  take  on  each  of  its  pos¬ 
sible  values  in  combination  with  those  of  other 
clauses  [Huang75].  Conditional  testing  thus 
subsumes  branch  testing.  Instrumentation  for 
conditional  testing  can  be  accomplished  by 
breaking  compound  conditional  statements  into 
simple  conditions  and  nesting  the  resulting  if 
statements.  This  reduces  the  problem  of  con¬ 
ditional  coverage  to  the  simpler  problem  of 
branch  coverage,  enabling  algorithms  from  the 
control  flow  view  to  be  employed. 

(ii)  Expression  testing 

Expression  testing  [Hamlet77a]  requires  that 
every  expression  assume  a  variety  of  values 
during  a  test  in  such  a  way  that  no  expression 
can  be  replaced  by  a  simpler  expression.  If  one 
assumes  that  every  statement  contains  an  ex¬ 
pression  and  that  conditional  expressions  form 
a  proper  subset  of  all  the  program  expressions, 
then  this  form  of  testing  properly  subsumes  all 
the  previously  mentioned  techniques.  Expres¬ 
sion  testing  requires  significant  runtime  sup¬ 
port  for  the  instrumentation  [HamletTTb]. 

(iii)  Domain  testing 

The  input  domain  of  a  program  can  be  par¬ 
titioned  according  to  which  inputs  cause  each 
path  to  be  executed.  These  partitions  are  called 
path  domains.  Faults  that  cause  an  input  to  be 
associated  with  the  wrong  path  domain  are 
called  domain  faults.  Other  faults  are  called 
computation  faults.  (The  terms  used  before  at¬ 
tempts  were  made  to  rationalize  nomenclature 
were  domain  errors  and  computation 
errors)  The  goal  of  domain  testing  is  to  dis¬ 
cover  domain  faults  by  ensuring  that  test  data 
limit  the  range  of  undetected  faults  [WhiteSO]. 
This  is  accomplished  by  selecting  inputs  close 
to  boundaries  of  the  path  domain.  If  the 
boundary  is  incorrect,  these  points  increase  the 
chance  of  an  infection’s  occurring.  Elomain 
testing  assumes  coincidental  correctness  does 
not  occur,  i.e.,  it  assumes  a  program  will  fail  if 
an  input  follows  the  wrong  path.^  [Clarke82] 
refmes  the  fault  detection  capability  of  this  ap¬ 
proach  by  requiring  points  to  be  selected  that 
further  limit  the  amount  a  boundary  can  shift 
without  an  infection’s  occurring. 


*The  definition  of  coincidental  conectness  has  since  been  broadened  to 
include  any  situation  in  which  a  fault  is  executed  without  an  ensuing 
failure. 


(iv)  Perturtjation  testing 

Perturbation  testing  attempts  to  determine  a 
sufficient  set  of  paths  to  test  for  various  faults 
in  the  code.  Faults  are  modeled  as  a  vector 
space,  and  characterization  theorems  describe 
when  sufficient  paths  have  been  tested  to  dis¬ 
cover  both  computation  and  domain  errors. 
Additional  paths  need  not  be  tested  if  they  can¬ 
not  reduce  the  dimensionality  of  the  error  space 
[Zeil83,  Zeil88]. 

(v)  Fault  sensitivity  testing 

[FosterSO]  describes  a  method  for  selecting  test 
data  that  are  sensitive  to  faults.  Howden  has 
formalized  this  approach  in  a  method  called 
Mfeak  mutation  testing  [Howden82].  Rules  for 
recognizing  fault-sensitive  data  are  described 
for  each  primitive  language  construct.  Satis¬ 
faction  of  a  rule  for  a  given  construct  during 
testing  means  that  all  alternate  forms  of  that 
construct  have  been  distinguished.  This  has  an 
obvious  advantage  over  mutation  testing 
(discussed  later) — elimination  of  all  mutants 
without  generating  a  single  one!  Some  rules 
even  allow  for  infinitely  many  mutants. 

c.  Propagation-oriented  testing 

A  testing  technique  is  considered  propagation- 
oriented  if  it  seeks  to  ensure  that  potential  infec¬ 
tions  propagate  to  failures.  This  requires  select¬ 
ing  paths  to  test  based  on  their  propagation  char¬ 
acteristics. 

(i)  Path  testing 

In  path  testing,  data  are  selected  to  ensure  that 
all  paths  of  the  program  have  been  executed 
[Howden76].  In  practice,  of  course,  such 
coverage  is  impossible  to  achieve,  for  a  variety 
of  reasons.  First,  any  program  with  an  in- 
defmite  loop  contains  infmitely  many  paths, 
one  for  each  iteration  of  the  loop.  Thus,  no 
finite  set  of  data  will  execute  all  paths.  The 
second  difficulty  is  the  infeasible  path  prob¬ 
lem:  it  is  undecidable  whether  an  arbitrary  path 
in  an  arbitrary  program  is  executable.  At¬ 
tempting  to  generate  data  for  such  infeasible 
paths  is  futile,  but  it  cannot  be  avoided.  Third, 
it  is  undecidable  whether  an  arbitrary  program 
will  halt  for  an  arbitrary  input.  It  is  therefore 
impossible  to  decide  whether  a  path  is  finite  for 
a  given  input. 

In  response  to  these  difficulties,  several 
simplifying  approaches  have  been  proposed. 
Infinitely  many  paths  can  be  partitioned  into  a 
finite  set  of  equivalence  classes  based  on  char¬ 
acteristics  of  the  loops.  Boundary  and  interior 
testing  requires  executing  loops  zero  times,  one 
time,  and,  if  possible,  the  maximum  number  of 
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times  [Howden75].  Linear  sequence  code  and 
jump  criteria  [WoodwardSO]  specify  a  hierar¬ 
chy  of  successively  mwe  complex  path 
coverages.  [Howden78a],  [TaiSO],  [Giourlay83], 
[Weyuker86],  and  [Ntafos88]  suggest  methods 
of  studying  the  adequacy  of  path  testing. 

Path  coverage  does  not  imply  condition 
coverage  or  expression  coverage,  since  an  ex¬ 
pression  may  tq>pear  on  multiple  paths  but 
some  subexpressions  may  never  assume  more 
than  one  value.  For  example,  in 

if  avb  then  else  S2 

b  may  be  false  and  yet  each  path  may  still  be 
executed. 

(ii)  Compiler-based  testing 

In  [Hamlet77a]  and  [Hamlat77b],  a  compiler 
augmented  to  judge  the  adequacy  of  test  data  is 
described.  Input-output  pairs  are  encoded  as  a 
comment  in  a  procedure,  as  a  partial  specifi¬ 
cation  of  the  function  to  be  computed  by  that 
procedure.  The  procedure  is  then  execut^  for 
each  of  the  input  values  and  checked  for  the 
output  values.  The  test  is  considered  adequate 
only  if  each  computational  or  logical  expres¬ 
sion  in  the  procedure  is  determined  by  the  test; 
i.e.,  no  expression  can  be  replaced  by  a  simpler 
expression  and  still  pass  die  test.  Simpler  is 
deflned  in  a  way  that  allows  only  finitely  many 
substitutions.  Thus,  as  the  prtKedure  is  ex¬ 
ecuted,  each  possible  substitution  is  evaluated 
on  the  data  state  presented  to  the  expression. 
Those  that  do  not  evaluate  the  same  as  the  orig¬ 
inal  expression  are  rejected.  Substitutions  that 
evaluate  the  same,  but  ultimately  produce 
failures,  are  likewise  rejected. 

(iii)  Data  flow  testing 

Data  flow  analysis  can  form  the  basis  for  test¬ 
ing,  exploiting  the  relationship  between  points 
where  variables  are  defined  and  points  where 
they  are  used  [Frankl88,  Laski83,  Ntafos84, 
Ntafos88,  Podgurski90,  Rapps85].  By  insisting 
on  the  coverage  of  various  definition-use 
pairs,^  data  flow  testing  establishes  some  of  the 
conditions  necessary  for  infection  and  partial 
propagation.  The  motivation  behind  data  flow 
testing  is  that  test  data  are  inadequate  if  they  do 
not  exercise  these  various  def-use  combina¬ 
tions.  It  is  clear  that  an  incorrect  definition  that 
is  never  used  during  a  test  will  not  be  caught 
by  that  test.  Similarly,  if  a  given  location  in- 


a  variable  x  is  deflned  at  location  A,  leferenced  (or  used)  at  location 
B,  and  there  is  a  path  from  AvaB  with  no  intervening  deflnition  of  x, 
then  {A,  B)  is  a  definition-use  pair. 


correctly  uses  a  particular  definition,  but  that 
combination  is  never  tried  during  a  test,  the 
fault  will  not  be  detected. 

Data  flow  connections  may  be  determined 
statically  [Rapps85]  ot  dynamically  [Korel88b]. 
Some  connections  may  be  infeasible  due  to  the 
presence  of  infeasible  subpaths.  Heuristics 
may  be  developed  for  generating  test  data 
bas^  on  data  flow  informatirxi  [Korel88b]. 

(iv)  Mutation  testing 

Mutation  testing  uses  mutation  analysis  to 
judge  the  adequacy  of  test  data.  The  test  data 
are  judged  adequate  only  if  each  mutant  is  ei¬ 
ther  functionally  equivalent  to  the  original  pro¬ 
gram  or  computes  output  different  from  the 
original  program  on  the  test  data.  Inadequacy 
of  the  test  data  implies  that  certain  faults  can  be 
introduced  into  the  code  and  go  undetected  by 
the  test  data. 

Mutation  testing  is  based  on  two  hypotheses 
[DeMillo78a].  The  competent  programmer 
hypothesis  says  that  a  competent  programmer 
will  write  code  that  is  close  to  being  correct; 
the  correct  program,  if  not  the  current  one,  can 
be  produced  by  some  straightforward  syntactic 
changes  to  the  code.  The  coupling  effect 
hypothesis  says  that  test  data  that  reveal  simple 
faults  will  uncover  complex  faults.  Thus,  only 
single  mutants  need  be  eliminated,  and  com¬ 
binatoric  effects  of  multiple  mutants  need  not 
be  considered  [DeMitlo78a].  [Gour1ay83]  for¬ 
mally  characterizes  the  competent  programmer 
hypothesis  as  a  function  of  the  probability  of 
the  test  set’s  being  reliable  (as  defined  by 
Gourlay)  and  shows  that  under  this  charac¬ 
terization,  the  hypothesis  does  not  hold.  Em¬ 
pirical  jusufication  of  the  coupling  effect  has 
been  attempted  [Budd80,  DeMilto78a,  Offutt89], 
but  theoretical  analysis  has  shown  that  it  may 
hold  probabilistic^ly,  but  not  universally 
[Gourlay83,  Morell88]. 

3.  Enor-oriented  testing 

Testing  is  necessitated  by  the  potential  presence  of 
errors  in  the  programming  process.  Techniques  that 
focus  on  assessing  the  presence  or  absence  of  errors 
in  the  programming  process  are  called  error- 
oriented. 

a.  Error-based  testing 

Error-based  testing  seeks  to  demonstrate  that  cer¬ 
tain  errors  have  not  been  committed  in  the  pro¬ 
gramming  process  [Weyuker80].  Error-based  test¬ 
ing  can  be  driven  by  histories  of  programmer  er¬ 
rors,  measures  of  software  complexity,  knowl¬ 
edge  of  error-prone  syntactic  constructs,  or  even 
error  guessing  [Myers79]. 
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En-or-based  testing  begins  with  the  programming 
process,  identifies  potential  errors  in  that  process 
and  then  asks  how  those  errors  are  reflected  as 
faults.  It  then  seeks  to  demonstrate  the  absence  of 
those  faults.  Howden  has  classified  errors  into 
two  categories:  abstraction  and  decomposition, 
and  he  has  developed  specific  techniques  for  ad¬ 
dressing  these  categories  [Howden89,  HowdenQO]. 

b.  Fault-based  testing 

Fault-based  testing  aims  at  demonstrating  that 
certain  prescribed  faults  are  not  in  the  code. 

Fault-based  testing  methods  differ  in  both  extent 
and  breadth.  One  with  local  extent  demonstrates 
that  a  fault  has  a  local  effect  on  computation;  it  is 
possible  that  this  local  effect  will  not  produce  a 
program  failure.  A  method  with  global  extent 
demonstrates  that  a  fault  will  cause  a  program 
failure.  Breadth  is  determined  by  whether  the 
technique  handles  a  finite  or  an  infinite  class  of 
faults.  Extent  and  breadth  are  (xthogonal. 
Infection-  and  propagation-cricnted  techniques 
could  be  classing  as  fault-based  if  they  are  inter¬ 
preted  as  seeking  to  demonstrate  the  absence  of 
particular  faults.  Infection-oriented  techniques 
are  of  local  extent 

[MorelISS]  and  [Morell90]  defme  a  fault-based 
method  based  on  symbolic  execution  that  permits 
elimination  of  inEnitely  many  faults  through 
evidence  of  global  failures.  Symbolic  faults  are 
inserted  into  the  code,  which  is  then  executed  on 
real  or  symbolic  data.  Program  output  is  then  an 
expression  in  terms  of  the  symbolic  faults.  It  thus 
reflects  how  a  fault  at  a  given  location  will  impact 
the  program’s  output  This  expression  can  be 
used  to  determine  actual  faults  that  could  not  have 
been  substituted  for  the  symbolic  fault  and  remain 
undetected  by  the  test 

c.  Probable  correctness 

Probable  correctness  is  defined  by  Hamlet  to  be 
the  probability  that  no  faults  exist  in  a  tested  pro¬ 
gram  [HamletST].  Early  intimations  of  this  con¬ 
cept  may  be  found  in  [DeMillo78b]  and 
[Rowland81],  where  particular  classes  of  functions 
have  members  that  can  be  distinguished  from 
other  members  by  a  finite  test  set.  As  such,  each 
successful  execution  increases  confidence  that  the 
implemented  function  is  correct.  [Hamlet90]  ex¬ 
plores  the  concept  when  no  a  priori  bound  can  be 
placed  on  the  number  of  executions  needed.  He 
bounds  the  number  of  inputs  needed  to  obtain 
high  confidence  in  a  high  probability  of  correct¬ 
ness. 

4.  Hybrid  Testing  Techniques 

Since  it  is  apparent  that  no  one  testing  technique  is 

sufficient,  some  experts  have  investigated  ways  of 


combining  several  techniques.  Such  integrated  tech¬ 
niques  are  called  hybrid  testing  techniques.  These 
are  not  just  the  concurrent  application  of  distinct 
techniques;  they  are  characterize  by  a  deliberate  at¬ 
tempt  to  incorporate  the  best  features  of  different 
methods  into  a  single  new  technique. 

In  partition  analysis,  test  data  are  chosen  to  ensure 
simultaneous  coverage  of  both  the  specification  and 
code  [Ricbardson85].  An  operational  specification 
language  has  been  designed  that  enables  a  structural 
measure  of  coverage  of  the  specification.  The  input 
space  is  partitioned  into  a  set  of  domains  that  is 
formed  by  the  cross  product  of  path  domains  of  the 
specification  and  path  domains  of  the  program.  Test 
data  are  selected  from  each  non-empty  partition,  en¬ 
suring  simultaneous  coverage  of  both  specification 
and  code.  Proof  of  correcmess  techniques  can  also 
be  applied  to  these  cross  product  domains. 

The  testing  system  DAISTS  predates  and  automates 
this  technique  for  algebraic  specifications  of  abstract 
data  types,  but  it  does  not  include  any  notion  of 
proof  of  correcmess  [Gannon81].  Purthermore,  the 
emphasis  in  DAISTS  is  on  test  data  evaluation, 
rather  than  generation.  [Goodenough75]  presents  a 
less  formal,  integrated  scheme  for  selecting  test  data 
based  on  analysis  of  sources  of  errors  in  the  pro¬ 
gramming  process.  [Richardson89]  applies  process 
programming  to  the  problem  of  interacting  testing 
techniques. 

IV.  Evaluating  Unit  Analysis  and  Testing 
Techniques 

The  effectiveness  of  unit  analysis  and  testing  may  be 
evaluated  on  theoretical  or  empirical  grounds 
[Howden78a].  Theory  seeks  to  understand  what  can  be 
'Ndone  in  principle;  empirical  evaluation  seeks  to  estab¬ 
lish  whm  techniques  are  useful  in  practice.  Theory 
formally  defines  the  field  and  investigates  its  funda¬ 
mental  limitations.  For  example,  it  is  well  known  that 
testing  cannot  demonstrate  the  correcmess  of  an  ar¬ 
bitrary  program  with  respect  to  an  arbitrary  specifi¬ 
cation.  This  does  not  mean,  however,  that  testing  can 
never  verify  correcmess;  indeed,  in  some  cases  it  can 
[Howden78c,  TaiSO].  Empirical  studies  evaluate  the 
utility  of  various  practices.  Though  statement  testing  is 
theoretically  deficient,  it  is  immensely  useful  in  prac¬ 
tice,  exposing  many  program  faults. 

IEEE  has  sponsored  several  workshops  on  testing, 
analysis,  and  verification.  The  National  Bureau  of 
Standards  has  issued  a  special  publication  that  de¬ 
scribes  many  of  the  techniques  mentioned  in  this  mod¬ 
ule  and  characterizes  each  approach  according  to  effec¬ 
tiveness,  applicability,  learning,  and  cost  [Powell82]. 
The  Strategic  Defense  Initiative  Organization  commis¬ 
sioned  a  study  of  the  state  of  the  art  in  verification 
techniques  that  resulted  in  an  extensive  overview  of  the 
field  [Youngblut89]  and  with  an  in-depth  annotated  bib¬ 
liography  [Brykczynski89]. 
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1.  Theoretical  evaluation 

Theory  serves  three  fundamental  purposes:  to  define 
terminology,  to  characterize  existing  practice,  and  to 
suggest  new  avenues  of  exploration.  Unfortunately, 
historical  terminology  is  inconsistent.  A  simple  ex¬ 
ample  is  the  word  reliable,  which  is  used  by  authors 
in  related,  but  diverse  ways.  (Compare,  for  ex¬ 
ample,  [DuranSI],  [Goodenough75],  [Howden76], 
and  [RichardsonSS],  and  do  not  inclu^  any  literature 
from  reliability  theory\)  [IEEE83|  is  an  sqipropriate 
starting  point  fw  examining  terminology,  but  it  is 
imprecise  in  places  and  was  established  many  years 
after  certain  (in  retrospect,  unfortunate)  terminology 
had  become  accepted.^  Theoretical  treatments  of 
topics  in  program  testing  are  ever  expanding. 
Goodenough  and  Gerhart,  in  [Goodenough75],  made 
an  attempt  to  rationalize  terminology,  though  this 
work  has  been  criticized,  particularly  in 
[WeyukerSO].  Nevertheless,  they  anticipated  the  vast 
majority  of  practical  and  theoretical  issues  that  have 
since  evolv^  in  program  testing.  [GoodenoughTS] 
is  therefore  required  reading.  Howden  and  Weyuker 
have  both  written  theoretical  expositions  on 
specification-,  implementation-,  and  error-oriented 
testing  [Howden76,  Howden78a,  Howden78c, 
Howden82,  Howden86,  Rapps85,  Weyuker80, 
Weyukar82,  Weyukar84,  Wayukar86].  Theoretical 
expositions  of  mutation  and  fault-ba^  testing  are 
found  in  [Budd80],  [Budd62],  [Chamiavsky87], 
[Davis88],  [Gourlay83],  [Hamlat77a],  [Morell88],  and 
[Morall90].  [Rowland81].  (Clarke89],  [Podgurski90], 
and  (Zail88]  present  fr^eworks  for  understanding 
data  flow  testing. 

2.  Empirical  evaluation 

Empirical  studies  provide  benchmarks  by  which  to 
judge  existing  testing  techniques.  An  excellent  com¬ 
parison  of  techniques  is  found  in  [Howdan80a], 
which  emphasizes  the  complementary  benefits  of 
different  testing  methods  applied  to  scientific  pro¬ 
grams.  Empirical  studies  of  mutation  testing  are  dis¬ 
cussed  in  [BuddSO].  [Basiii87]  compares  the  effec¬ 
tiveness  of  code-reading,  specification-oriented  test¬ 
ing,  and  implementation-oriented  testing. 
[Weyuker88]  discusses  an  empirical  study  of  the 
complexity  of  data  flow  testing. 

Many  papers  discussing  experience  with  testing 
techniques  can  be  found  in  conference  proceedings, 
especi^ly  proceedings  of  the  following: 

•  ACM  Symposium  on  Principles  of  Pro¬ 
gramming  Languages 

•  International  Conference  on  Software  En¬ 
gineering 


'For  instance,  the  tetminology  error-based  testing  and  error  seeding 
became  well-esublished  long  before  the  sundard  told  us  to  use  faidt. 


•  Computer  Software  and  Applications  Con¬ 
ference 

•  Testing,  Analysis,  and  Vaification  Con¬ 
ference 

•  International  Conference  on  Testing  Com¬ 
pute  Software 

•  Software  Testing  and  Review  Conference 

V.  Managerial  Aspects  of  Unit  Analysis  and  Testing 

Administration  of  unit  analysis  and  testing  proceeds  in 
two  stages.  First,  techniques  tqipropriate  to  the  project 
must  be  selected.  These  techniques  must  then  be  ap¬ 
plied  systematically.  [IEEE87]  provides  explicit 
guidance  for  these  steps. 

1.  Selecting  techniques 

Selecting  the  appropriate  techniques  from  the  array 
of  possibilities  is  a  complex  ta^  that  requires  as¬ 
sessment  of  many  issues,  including  the  goal  of  test¬ 
ing,  the  nature  of  the  software  product,  and  the  na¬ 
ture  of  the  test  environment  It  is  important  to  re¬ 
member  the  complementary  benefits  of  the  various 
techniques  and  to  select  as  broad  a  range  of  tech¬ 
niques  as  possible,  within  constraints  of  time,  cost, 
etc.  No  single  analysis  or  testing  technique  is  suf¬ 
ficient  [Garhar176].  Specification-orient^  testing 
may  suffer  from  inadequate  code  coverage, 
implementation-oriented  testing  may  suffer  from  in¬ 
adequate  specification  coverage,  and  neither  tech¬ 
nique  guarantees  the  benefits  of  error-oriented 
coverage. 

2.  Goals 

Different  design  goals  impose  different  demands  on 
the  selection  of  testing  techniques.  Achieving  cor¬ 
rectness  requires  use  of  a  great  variety  of  techniques. 
A  goal  of  reliability  implies  the  ne^  for  statistical 
testing  using  test  data  representative  of  those  of  the 
anticipated  user  environment  It  should  be  noted, 
however,  that  statistical  testing  still  requires  judi¬ 
cious  use  of  “selective”  tests  to  avoid  embarrassing 
or  disastrous  situations.  Testing  may  also  be  di¬ 
rected  toward  assessing  the  usability  of  software. 
This  kind  of  testing  requires  a  solid  foundation  in 
human  factors  [Perlman90].  Performance  of  the 
software  may  also  be  of  special  concern.  In  this 
case,  extremal  testing  is  essential.  Timing  in¬ 
strumentation  can  prove  useful. 

Often,  several  of  these  goals  must  be  achieved 
simultaneously.  Recent  attempts  in  process  pro¬ 
gramming  have  sought  to  address  this  issue 
[Richardson89].  One  approach  to  testing  undo*  these 
circumstances  is  to  order  testing  by  decreasing  bene¬ 
fit.  For  example,  if  reliability,  correctness,  and  per¬ 
formance  are  all  desired  features,  it  is  reasonable  to 
tackle  performance  first,  reliability  second,  and  cor¬ 
rectness  third,  since  these  goals  require  increasingly 
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difncult-to-design  tests.  This  ^proach  can  have  the 
beneficial  effect  of  identifying  faulty  code  with  less 
effort  expended. 

3.  Nature  of  the  product 

The  nature  of  the  software  product  plays  an  impor¬ 
tant  role  in  the  selection  of  appropriate  techniques. 
Four  representative  types  of  software  products  are 
discuss^  below. 

a.  Data  processing 

Data  processing  applications  appear  to  benefit 
horn  most  of  the  techniques  described  in  this 
module.  Conventional  languages  such  as  CO¬ 
BOL  are  frequently  used,  increasing  the  likeli¬ 
hood  of  finding  an  instrumented  compiler  for  do¬ 
ing  performance  and  coverage  analysis.  Func¬ 
tional  test  cases  are  typically  easy  to  identify 
[Redwine83].  Even  domain  testing,  with  its  many 
restrictions,  seems  applicable,  since  most  predi¬ 
cates  in  data  processing  programs  are  linear 
[WhiteSO]. 

b.  Scientific  computation 

Howden  analyzed  a  variety  of  verification  tech¬ 
niques  on  the  IMSL  routines  [HowdenSOa],  He 
concluded  that  functional  and  structure-oriented 
testing  are  complementary,  that  neither  is  suf¬ 
ficient,  and  that  sometimes  a  hybrid  approach  is 
necessary  to  cover  extremal  values  while  simul¬ 
taneously  executing  a  particular  path.  Static  veri¬ 
fication  methods  found  fewer  errors  in  Howden’s 
study,  but  their  earlier  application  in  the  life  cycle 
may  increase  their  effectiveness.  Extremal  value 
testing  and  special  value  testing  are  vital  to  scien¬ 
tific  programs.  Statistical  testing  is  perhaps  less 
apprc^riate,  since  these  programs  are  frequently 
constructed  to  solve  problems  whose  character¬ 
istics  are  not  known  in  advance.  The  IMSL  pack¬ 
age  illustrates  this;  the  designers  of  the  package 
cannot  make  “reasonable”  assumptions  about  the 
distribution  of  arguments  to  the  sine  routine,  for 
instance. 

c.  Expert  systems 

Expert  systems  pose  unique  challenges  to  verifi¬ 
cation.  Coverage  of  the  executable  code  is  of 
little  use,  since  the  behavior  of  the  system  is  dom¬ 
inated  by  the  knowledge  base.  Difficulties  arise 
in  assuring  the  consistency  of  this  knowledge 
base.  This  problem  is  compounded  by  the  reli¬ 
ance  on  human  experts,  since  precise  behavior  is 
difficult  to  specify.  A  good  survey  of  the  prob¬ 
lems  related  to  validation  of  expert  systems  ap¬ 
pears  in  (Hayes-Roth83]. 

Three  steps  can  be  identified  as  minimal  require¬ 
ments  for  verification  of  an  expert  system.  First, 
it  is  necessary  to  clean  up  the  knowledge  base  in 


much  the  same  manner  as  is  done  for  a  BNF 
grammar.  Inconsistencies  must  be  detected, 
redundancies  eliminated,  loops  broken,  etc.  Sym¬ 
bolic  execution  and  data  flow  analysis  appear  to 
be  applicable  to  this  stage.  Second,  each  piece  of 
information  in  the  knowledge  base  must  be  ex¬ 
ercised.  Mutation  analysis  applied  to  the  knowl¬ 
edge  base  detects  the  information  whose  change 
does  not  affect  the  output  and,  thus,  is  not  suf¬ 
ficiently  exercised.  Third,  test  case  design  and 
evaluation  must  be  conducted  by  experts  in  the 
application  domain. 

d.  Embedded  and  real-time  systems 

Embedded  and  real-time  systems  are  perhaps  the 
most  complex  systems  to  specify,  design,  and  im¬ 
plement.  It  is  no  surprise  that  they  are  partic¬ 
ularly  hard  to  verify  [Carver91,  Tai91,  Weiss88]. 
Emb^ded  computer  systems  typically  have  in¬ 
convenient  interfaces  for  defming  and  conducting 
tests.  Ultimately,  the  code  must  execute  on  the 
embedded  computer  in  its  operational  environ¬ 
ment.  Operational  testing  is  performed  in  this 
environment 

Unit  testing  in  an  operational  environment  is 
rarely  possible.  The  equipment  is  seldom  avail¬ 
able  and  may  lack  conventional  input  and  output. 
In  these  cases,  the  embedded  computer  can  be 
placed  in  a  controlled  environment  that  simulates 
the  operational  one.  This  provides  the  capability 
of  conducting  a  system  test.  Timing  constraints 
must  be  verified  here.  To  assess  time-critical 
software,  it  is  essential  to  collect  data  in  as  un¬ 
obtrusive  a  manner  as  possible.  Typically,  this 
requires  hardware  instrumentation,  though  soft¬ 
ware  breakpoints  sometimes  suffice.  Data  from 
several  points  of  instrumentation  must  be  coor¬ 
dinated  and  analyzed;  such  a  process  is  called 
data  reduction. 

If  the  simulated  environment  does  not  support 
unit  testing,  the  embedded  computer  itself  must 
be  abstracted.  The  software  can  be  written  in 
assembly  language,  and  the  embedded  computer 
can  be  simulated  on  another  machine;  or  the  soft¬ 
ware  can  be  written  in  a  high-level  language,  such 
as  Ada,  which  can  be  cross-compiled  to  the  target 
machine.  At  this  level  of  abstraction,  unit  anal¬ 
ysis  and  testing  are  possible.  The  goal  during  this 
stage  is  to  assess  correctness  of  individual  units. 
Functional  testing  is  essential,  especially  ex¬ 
tremal,  midrange,  and  special  value  testing,  since 
it  is  impossible  to  ensure  these  tests  will  occur 
during  integration  or  system  testing.  Data  flow 
analysis  of  the  code,  especially  if  the  system  is 
written  in  assembly  language,  is  appropriate.  A 
simulator  can  be  instrumented  to  collect  neces¬ 
sary  code  coverage  statistics  and  enable  replaying 
of  the  previous  executions  (Carver91,  Tai91]. 
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Static  analysis  of  concurrency  based  on  symbolic 
execution  is  described  in  [Young88]. 

4.  Nature  of  the  testing  environment 

Available  resources,  personnel,  and  project  con¬ 
straints  must  be  considered  in  selecting  testing  and 
analysis  strategies. 

a.  Available  resources 

Available  resources  frequently  determine  the  ex¬ 
tent  of  testing.  If  the  compiler  does  not  instru¬ 
ment  code,  if  data  flow  andysis  tools  are  not  at 
hand,  if  exotic  tools  for  mutation  testing  or  sym¬ 
bolic  evaluation  are  not  available,  one  must  per¬ 
form  functional  testing  and  instrument  the  code 
by  hand  to  detect  branch  coverage.  Hand  in¬ 
strumentation  is  not  difficult,  but  it  is  an  error- 
prone  and  time-consuming  process.  Editor  scripts 
can  aid  in  this  process.  If  resources  permit,  suc¬ 
cessively  more  complex  criteria  involving  branch 
testing,  data  flow  testing,  domain  testing,  and 
fault-based  testing  can  be  tried. 

b.  Personnel 

No  technique  is  without  its  personnel  costs.  Be¬ 
fore  inU’oducing  any  new  technique  or  tool,  the 
impact  on  personnel  must  be  considered.  The  ad¬ 
vantages  of  any  approach  must  be  balanced 
against  the  effort  required  to  learn  the  technique, 
the  ongoing  time  demands  of  applying  it,  and  the 
expertise  it  requires.  Domain  testing  can  be  quite 
difficult  to  learn.  Data  flow  analysis  may  uncover 
many  anomalies  that  are  not  errors,  thereby  re¬ 
quiring  personnel  to  sort  through  and  distinguish 
them.  Special  value  testing  requires  expertise  in 
the  application  area  Analysis  of  the  impact  on 
personnel  for  many  of  the  techniques  in  this  mod¬ 
ule  can  be  found  in  [Powell82]. 

c.  Project  constraints 

The  goal  in  selecting  analysis  and  testing  tech¬ 
niques  is  to  obtain  the  most  benefit  from  testing 
within  the  project  constraints.  Testing  is  indeed 
over  when  the  budget  or  the  time  allotted  to  it  is 
exhausted,  but  this  is  not  an  appropriate  definition 
of  when  to  slop  testing  [Myers79).  Estimates  in¬ 
dicate  that  approximately  40%  of  software  devel¬ 
opment  time  is  used  in  the  testing  phase. 
Scheduling  must  reflect  this  fact. 

5.  Control 

To  ensure  quality  in  unit  analysis  and  testing,  it  is 
necessary  to  control  both  documentation  and  the 
conduct  of  the  test 

a.  Configuration  control 

Several  types  of  items  from  unit  analysis  and  test¬ 
ing  should  be  placed  under  configuration  manage¬ 


ment,  including  the  test  plan,  test  procedures,  test 
data,  and  test  results.  A  formal  description  of 
these  and  related  items  is  found  in  [IEEE83].  The 
test  plan  specifies  the  goals,  environment,  and 
constraints  imposed  on  testing.  The  test  proce¬ 
dures  detail  the  step-by-step  activities  to  be  per¬ 
formed  during  the  test.  Regression  testing  occurs 
when  previously  saved  test  data  are  used  to  test 
modified  code.  Its  principal  importance  is  that  it 
ensures  previously  attaint  functionality  has  not 
been  lost  during  a  modification.  Test  results  are 
recorded  and  analyzed  for  evidence  of  program 
failures.  Software  with  a  history  of  frequent 
failures  may  be  a  candidate  for  redesign  or 
reimplementation. 

b.  Conducting  tests 

A  test  bed  is  an  integrated  system  for  testing  soft¬ 
ware.  Many  such  systems  exist  as  commercial 
products.  Minimally,  they  provide  the  ability  to 
defme  a  test  case,  construct  a  test  driver,  execute 
the  test  case,  and  capture  the  output.  Additional 
facilities  provided  by  such  systems  typically  in¬ 
clude  data  flow  analysis,  suuctural  coverage  as¬ 
sessment,  regression  testing,  test  specification, 
and  report  generation.  [Frankl88]  describes  AS¬ 
SET,  a  system  for  analyzing  data  flow  coverage 
of  a  test  case. 


Glossary 


The  following  terminology  is  used  throughout  the 
module,  except  possibly  in  the  abstracts  in  the  bibli¬ 
ography.  Additional  tenns  are  defined  in  the  text. 
Note  that  older  literature  is  replete  with  inconsis¬ 
tencies  in  the  use  of  such  ternis  as  “error,”  “failure,” 
and  “fault.”  Consistent  use  of  these  terms  has  been 
attempted  here,  but  such  consistency  may  itself  lead 
to  confusion  in  the  many  cases  where  “modem” 
usage  conflicts  with  prior  usage  in  the  literature. 

adequacy  criteria 

Conditions  that  must  be  satisfied  before  testing 
is  considered  complete. 

analysis 

The  process  of  determining  software  character¬ 
istics.  Analysis  is  dynamic  if  it  is  execution- 
based,  otherwise  it  is  static. 

coverage 

Used  in  conjunction  with  a  software  feature  or 
characteristic,  the  degree  to  which  that  feature  or 
characteristic  is  tested  or  analyzed.  Examples 
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include  input  domain  coverage,  statement 
coverage,  branch  coverage,  and  path  coverage. 

error 

A  human  action  that  produces  an  incorrect  result 
[IEEE90]. 

failure 

The  inability  of  a  system  or  component  to  per¬ 
form  its  required  functions  within  specified  per¬ 
formance  requirements  [IEEE90]. 

fault 

An  incorrect  step,  process,  or  data  definition  in  a 
computer  program  (IEEE90]. 

oracle 

A  mechanized  procedure  that  decides  whether  a 
given  input-output  pair  is  acceptable. 

software  characteristic 

An  inherent,  possibly  accidental,  trait,  quality,  or 
property  of  software  [IEEE87]. 

software  feature 

A  software  characteristic  specified  or  implied  by 
requirements  documentation  [IEEE87]. 


test  bed 

An  environment  containing  the  hardware,  in¬ 
strumentation,  simulators,  software  tools,  and 
other  support  elements  needed  to  conduct  a  test 
[IEEE90]. 

test  data 

Data  developed  to  test  a  system  or  system  com¬ 
ponent  [IEEE831. 

testing 

Verifying  a  system  or  component  using  software 
characteristics  derived  from  dynamic  analysis. 

unit 

A  software  element  that  can  be  treated  meaning¬ 
fully  as  a  whole. 

verification 

The  process  of  determining,  for  a  system  or 
component,  whether  the  products  of  a  given  de¬ 
velopment  phase  satisfy  the  conditions  imposed 
at  the  start  of  that  phase.  (Adapted  from 
[IEEE90].) 
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Figure:  Software  Verification  Model 


20 


SEI-CM-9-2.0 


Unit  Analysis  afx^  Testing 


Teaching  Considerations 


Textbooks 


There  has  been  an  explosion  of  books  on  the  sub¬ 
jects  of  analysis  and  testing,  only  a  few  of  which 
will  be  mentioned  here.  [Myers79]  is  dated  but  pro¬ 
vides  a  good  overview  of  structural  coverage  and 
some  specification-based  testing.  It  can  still  serve 
well  as  a  supplementary  text  in  an  introductory  soft¬ 
ware  engineering  course.  (BeizerQO]  is  eclectic,  con¬ 
taining  more  testing  techniques  than  any  other  refer¬ 
ence.  The  text  is  written  in  a  captivating  style  and 
makes  frequent  appeals  to  the  experiences  of  actual 
projects.  [Howden87]  is  the  first  text  to  approach 
analysis  and  testing  within  a  unified  framework.  It 
contains  all  the  necessary  theoretical  and  practical 
background,  and  could  be  used  as  a  text  for  a  grad¬ 
uate  seminar.  [Ould86]  succinctly  places  unit  anal¬ 
ysis  and  testing  in  its  context  witWn  the  overall  veri¬ 
fication  effort. 

To  gain  a  full  appreciation  of  the  important  issues, 
each  of  these  texts  must  be  supplemented  with  read¬ 
ings  from  the  current  literature.  Suggested  Reading 
Lists  (page  22)  contains  a  table  categorizing  entries 
in  the  annotated  bibliography  according  to  their  po¬ 
tential  use. 


Suggested  Schedules 


The  following  are  suggestions  for  using  the  material 
in  this  module  in  various  classroom  contexts.  Num¬ 
bers  in  parentheses  represent  suggested  lecture  hours 
to  be  allocated  to  each  topic. 

One-Term  Undergraduate  Introduction  to  Soft¬ 
ware  Engineering.  The  large  quantity  of  material 
to  be  covered  in  this  course  makes  it  difficult  to  deal 
with  any  topic  in  depth.  The  following  minimum 
coverage  of  unit  analysis  and  testing  issues  is  sug¬ 
gested: 

•  Theory  (0.5) 

•  Program  Views  and  Related  Analyses  (1.5) 

•  Specification-Based,  Implementation-Based, 
and  Error-Based  Testing  (1.5) 

•  Managerial  Aspects  (1.5) 

Total:  5.0  hours 


Undergraduate  Course  on  Verification  Tech¬ 
niques.  A  course  covering  proof  of  correctness,  re¬ 
view  techniques,  and  analysis  and  testing  provides  a 
springboard  for  understanding  the  complicated  is¬ 
sues  of  verification.  Suggested  coverage: 

•  TlKory  (1.5) 

•  Program  Views  and  Related  Analyses  (4.5) 

•  Specification-Based  Testing  (3.0) 

•  Implementation-Based  Testing  (4.5) 

•  Error-Based  Testing  Testing  (3.0) 

•  Managerial  Aspects  (1.5) 

Total:  18  hours 

Graduate  Seminar  on  Analysis  and  Testing.  As 
indicated  in  Suggested  Reading  Lists,  there  is  a 
wealth  of  material  to  support  a  graduate  seminar  in 
testing.  The  entire  outline  of  this  module  can  be 
covered,  with  additional  topics  included  as  deemed 
appropriate.  The  suggestions  given  below  focus  on 
how  this  material  can  be  taught  in  a  seminar  format. 

The  instructor  delivers  an  introductory  lecture  in 
each  of  the  major  topic  areas.  Lectures  should  be 
based  on  references  in  the  “essential”  category 
(column  1  of  the  table).  A  subset  of  papers  from  the 
“recommended”  list  (column  2)  is  selected  to  be  read 
by  all  students;  one  student  should  act  as  presenter 
for  each  paper.  For  this  approach  to  succeed,  papers 
and  presenters  must  be  selected  well  in  advance,  and 
both  presenters  and  participants  must  be  prepared. 
To  ensure  this  advance  preparation,  the  instructor 
should: 

•  Approve  all  paper  selections. 

•  Meet  with  each  presenter  at  least  two  weeks  in 
advance  of  the  presentation  to  answer 
questions,  determine  presentation  format,  and 
together  write  a  set  of  exercises  for  the  other 
students. 

•  Distribute  the  assigned  reading  as  soon  as 
possible  and  the  set  of  exercises  at  least  one 
week  in  advance  of  the  presentation. 

•  Be  prepared  to  assist  each  student  at  his  or  her 
presentation,  if  necessary. 

This  approach  requires  discipline  on  everyone’s  part. 
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Broad  coverage  of  material  is  aided  by  requiring 
each  student  to  write  a  term  paper  in  one  of  the  areas 
related  to  the  course.  Readings  listed  in  the  “es¬ 
sential”  and  “recommended”  columns  provide 
breadth,  while  those  categorized  as  “detailed”  or 
“expert”  provide  depth. 

Suggested  coverage: 

•  Background  (1.0) 

•  Program  Views  and  Related  Analyses 

•  Textual  and  syntactic  views  (1.0) 

•  Control  flow  view  (2.0) 

•  Data  flow  view  (4.0) 

•  Data  state  view  (3.0) 

•  Functional  view  (2.0) 

•  Specification-Based  Testing 

•  Testing  independent  of  the  specification 
technique  (1.0) 

•  Testing  dependent  on  the  specification 
technique  (3.0) 

•  Implementation-Based  Testing 

•  Structure-based  testing  (2.0) 

•  Infection-based  testing  (3.0) 

•  Propagation-based  testing  (3.0) 

•  Error-Oriented  Testing 

•  Error-based  testing  (3.0) 

•  Fault-based  testing  (3.0) 

•  Probable  correctness  (1.0) 

•  Hybrid  Testing  Techniques  (3.0) 

•  Evaluation  of  Unit  Analysis  and  Testing 

•  Theoretical  (2.0) 

•  Empirical  (2.0) 

•  Managerial  Aspects  of  Unit  Analysis  and 
Testing 

•  Selecting  techniques  (1.0) 

•  Configuration  items  (1.0) 

•Test  beds  (1.0) 

Total;  42  hours 


Exercises 


It  is  not  sufficient  merely  to  study  techniques — they 
must  be  applied  to  software  and  evaluated.  For¬ 
tunately  there  is  no  lack  of  software  to  be  verified! 
The  traditional  projected-oriented  software  engineer¬ 
ing  course  clearly  should  have  a  testing  component. 
If  a  testing  seminar  is  held  concurrently  with  such  a 
course,  the  students  taking  the  seminar  can  act  as  an 
independent  test  organization,  as  tool  builders,  as 
consultants,  etc.,  for  the  software  engineering  class. 
Alternatively,  programs  can  be  obtained  from  anoth¬ 
er  class  or  from  industry  for  sustained  testing. 

In  a  testing  seminar,  the  complementary  benefits  of 
specification-based  and  implementation-based  test¬ 
ing  can  be  illustrated  by  dividing  the  seminar  partici¬ 
pants  into  two  groups.  Have  each  group  produce 
a  specification  and  fault-fiUed  program.  The 
specification-based  group  receives  the  specification 
and  object  code  from  the  implementation-based 
group,  which,  in  turn,  receives  only  the  source  code 
from  the  specification-based  group.  After  testing  is 
complete,  the  groups  compare  results.  Roles  of  the 
two  groups  can  then  be  reversed. 

Testing  tools  are  prime  candidates  for  projects.  Ru¬ 
dimentary  test  beds,  data  flow  analyzers,  and  code 
instrumenters  can  be  implemented  in  one  term. 
Tools  developed  during  one  term  can  serve  both  as 
test  tools  and  test  objects  for  the  next. 


Suggested  Reading  Lists 


The  following  lists  categorize  items  in  the  bibliog¬ 
raphy  by  applicability.  “Essential”  reading  is  com¬ 
posed  of  references  that  provide  appropriate  entry 
points  into  the  literature  on  topics  treated  in  the 
module.  It  is  not  necessary  for  the  instructor  to  read 
every  one  of  these  references,  but  time  will  be  well 
spent  reading  those  addressing  topics  that  will  ac¬ 
tually  be  taught.  Many  items  in  this  category  are 
accessible  to  students  as  well.  “Recommended” 
reading  provides  additional  background,  building  on 
the  groundwork  laid  by  reading  from  the  essential 
list.  Readings  in  the  “Detailed”  list  are  narrower  in 
scope  and  generally  require  background  reading 
from  the  first  two  categories.  These  papers  can 
serve  as  the  basis  for  class  projects.  "Expert”  read¬ 
ing  requires  background  in  areas  of  mathematics 
such  as  computability  theory,  statistics,  or  algorithm 
analysis.  Most  of  the  papers  in  this  category  are  the¬ 
oretical. 
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paring  the  Effectiveness  of  Software  Testing 
Strategies.”  IEEE  Trans.  Software  Eng.  SE-13,  12 
(Dec.  1987),  1278-1296. 

Abstract:  This  study  applies  an  experimentation 
methodology  to  compare  three  state-of-the-practice 
software  testing  techniques:  a)  code  reading  by 
stepwise  abstraction,  b)  Junctional  testing  using 
equivalence  partitioning  and  boundary  value  anal¬ 
ysis,  and  c)  structural  testing  using  100  percent 
statement  coverage  criteria.  The  study  compares 
the  strategies  in  three  aspects  of  software  testing: 
fault  detection  effectiveness,  fault  detection  cost, 
and  classes  of  faults  detected.  Thirty-two  profes¬ 
sional  programmers  and  42  advanced  students  ap¬ 
plied  the  three  techniques  to  four  unit-sized  pro¬ 
grams  in  a  fractional  factorial  experimental  design. 
The  major  results  of  this  study  are  the  following.  1) 
With  the  professional  programmers,  code  reading 
detected  more  software  faults  and  had  a  higher  fault 
detection  rate  than  did  functional  or  structural  test¬ 
ing,  while  Junctional  testing  detected  more  faults 
than  did  structural  testing,  but  Junctional  and  struc¬ 
tural  testing  were  not  different  in  fault  detection 
rate.  2)  In  one  advanced  student  subject  group, 
code  reading  and  Junctional  testing  were  not  differ¬ 
ent  in  faults  found,  but  were  both  superior  to  struc¬ 
tural  testing,  while  in  the  other  a^anced  student 
subject  group  there  was  no  difference  among  the 
techniques.  3)  With  the  advanced  student  subjects, 
the  three  techniques  were  not  different  in  fault  de¬ 
tection  rate.  4)  Number  of  faults  observed,  fault 
detection  rate,  and  total  effort  in  detection 
depended  on  the  type  of  software  tested.  5)  Code 
reading  detected  more  interface  faults  than  did  the 
other  methods.  6)  Functional  testing  detected  more 
control  faults  than  did  the  other  methods.  7)  When 
asked  to  estimate  the  percentage  of  faults  detected, 
code  readers  gave  the  most  accurate  estimates 
while  functional  testers  gave  the  least  accurate  es¬ 
timates. 

“Functional  testing”  corresponds  to  specification- 
based  testing,  as  used  in  this  module.  This  paper 
should  be  read  as  much  for  its  detailed  description 
of  experimental  design  as  for  its  conclusions.  The 
complexity  of  designing  and  conducting  an  exper¬ 
iment  of  this  magnitude  is  clearly  illustrated.  The 
references  provide  a  fairly  complete  list  of  other 


studies  that  have  compared  the  effectiveness  of  soft¬ 
ware  testing  strategies. 

This  paper  is  important  reading  for  anyone  thinking 
of  conducting  an  experiment  in  software  testing. 
Basic  statistical  competence  is  assumed. 

Bazzlchi82 

Bazzichi,  Franco,  and  Ippolito  Spadafora.  “An  Auto¬ 
matic  Generator  for  Compiler  Testing.”  IEEE  Trans. 

Software  Eng.  SE-8, 4  (July  1982),  343-353. 

Abstract:  A  new  method  for  testing  compilers  is 
presented.  The  compiler  is  exercised  by  compatible 
programs,  automatically  generated  by  a  test  gener¬ 
ator.  The  generator  is  driven  by  a  tabular  descrip¬ 
tion  of  the  source  language.  This  description  is  in  a 
formalism  which  nicely  extends  context-free  gram¬ 
mars  in  a  context-dependent  direction,  but  still 
retains  the  structure  and  readability  of  BNF.  The 
generator  produces  a  set  of  programs  which  cover 
all  grammatical  constructions  of  the  source  lan¬ 
guage,  unless  user  supplied  directives  instruct 
otherwise.  The  programs  generated  can  also  be 
used  to  evaluate  the  performance  of  different  com¬ 
pilers  of  the  same  source  language. 

A  significant  example  from  Pascal  is  presented,  and 
experience  with  the  generator  is  reported. 

The  approach  taken  here  is  one  similar  to  that  of  a 
two-level  grammar  for  specifying  context  sensitiv¬ 
ity.  The  problems  inherent  in  specifying  semantic 
constraints  on  a  programming  l^guage  are  clearly 
presented.  However,  the  presentation  is  difficult  to 
understand  without  consulting  the  cited  references. 

This  p^r  or  [DuncanSI]  should  be  read  by  the  in¬ 
structor.  It  is  a  difficult  paper  for  students,  though 
its  goal  should  be  apparent 

Belzer90 

Beizer,  Boris.  Software  Testing  Techniques,  2nd  Ed. 

New  York:  Van  Nostrand  Reinhold,  19W. 

This  book  offers  enough  breadth  and  depth  to  war¬ 
rant  its  use  for  a  variety  of  academic  and  training 
courses.  Using  a  flamboyant  style  that  captures  the 
reader’s  attention,  Beizer  expl^s  a  multitude  of 
testing  techniques  within  a  management  framework 
of  his  own  devising.  The  book  contains  an  exten¬ 
sive  taxonomy  of  bugs  (faults)  and  related  bug 
counts.  The  book  stresses  implementation-based 
testing,  especially  path  testing. 
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Berztiss88 

Berztiss,  Alfs,  and  Mark  A.  Ardis.  Formal  Verifi¬ 
cation  of  Programs.  Curriculum  Module  SEI-CM- 
20-1.0,  Software  Engineering  Institute,  Carnegie 
Mellon  University,  Pittsburgh,  Pa.,  Dec.  1988. 

Capsule  Description:  This  module  introduces  for¬ 
mal  verification  of  programs.  It  deals  primarily 
with  proofs  of  sequential  programs,  but  also  with 
consistency  proofs  for  data  types  and  deduction  of 
particular  behaviors  of  programs  from  their  specfi- 
cations.  Two  approaches  are  considered:  verifi¬ 
cation  after  implementation  that  a  program  is  con¬ 
sistent  with  its  specification,  and  parallel  develop¬ 
ment  of  a  program  and  its  specification.  An  assess¬ 
ment  of  formal  verification  is  provided. 

Brykczynski89 

Brykczynski,  Bill  R.,  and  Christine  Youngblut. 
Bibliography  of  Testing  and  Evaluation  Reference 
Material.  IDA  Memorandum  Report  M-496,  Insti¬ 
tute  for  Defense  Analyses,  Alexandria,  Va.,  Aug. 
1989. 

Over  1900  entries  with  full  abstracts  are  classified 
by  author  and  topic.  This  bibliography  is  a  com¬ 
panion  to  [Youngblut89]. 

BuddSO 

Budd,  Timothy  A.,  Richard  A.  DeMillo,  Richard 
J.  Lipton,  and  Frederick  G.  Sayward.  “Theoretical 
and  Empirical  Studies  on  Using  Program  Mutations 
to  Test  the  Functional  Correctness  of  Programs.” 
Conf.  Record  7th  Ann.  ACM  Symp.  on  Principles  of 
Prog.  Lang.  New  Yoik:  ACM,  Jan.  1980, 220-233. 

This  paper  presents  little-known  results  on  mutation 
testing,  both  theoretical  and  empirical.  The  theoret¬ 
ical  section  can  be  safely  ignored,  except  fcx'  the 
analysis  of  decision  tables  and  straight-line  Lisp 
programs.  The  empirical  results  are  more  interest¬ 
ing,  since  they  provide  insight  into  the  mutant 
operators  used  and  their  success  on  buggy  pro¬ 
grams. 

The  theoretical  section  is  useful  only  for  those  who 
wish  to  pursue  mutation  testing  at  an  expert  level. 
The  empirical  section  is  of  some  use  in  demonstrat¬ 
ing  when  mutation  testing  does  and  does  not  work. 

Budd82 

Budd,  Timothy  A.,  and  Dana  Angluin.  “Two  No¬ 
tions  of  Correcmess  and  Their  Relation  to  Testing.” 
Acta  Informatica  18,  1  (1982),  31-45. 

Abstract:  We  consider  two  interpretations  for  what 
it  means  for  test  data  to  demonstrate  correctness. 
For  each  interpretation,  we  examine  under  what 
conditions  data  sitfficient  to  demonstrate  correct¬ 


ness  exists,  and  whether  it  can  be  automatically  de¬ 
tected  and/or  generated.  We  establish  the  relation 
between  these  questions  and  the  problem  of  decid¬ 
ing  equivalence  of  two  programs. 

This  paper  requires  a  good  background  in  computa¬ 
bility  theory.  A  theoretical  analysis  of  mutation 
testing  is  presented  in  excellent  style. 

This  paper  is  for  experts.  Students  without  a  course 
in  computability  will  be  lost 

Carver91 

Carver,  Richard  H.,  and  Kuo-Chung  Tai.  “Replay 
and  Testing  for  Concurrent  Programs.”  IEEE  Soft¬ 
ware  8, 2  (March  1991),  66-74. 

This  is  a  good  starting  point  for  understanding  prob¬ 
lems  involved  in  testing  concurrent  programs.  The 
mechanism  described  by  the  authors  enables  suf¬ 
ficient  state  information  of  the  program  to  be  cap¬ 
tured  to  allow  prior  behavior  to  be  repeated  for  the 
purpose  of  debugging  or  testing. 

Cherniavsky87 

Chemiavsky,  John  C.,  and  Carl  H.  Smith.  “A  Recur¬ 
sion  Theoretic  Approach  to  Program  Testing.”  IEEE 
Trans.  Software  Eng.  SE-I3.  7  (July  1987),  777-784. 

Abstract:  Inductive  inference,  the  automatic  syn¬ 
thesis  of  programs,  bears  certain  ostensible  rela¬ 
tionships  with  program  testing.  For  inductive  in¬ 
ference,  one  must  take  a  finite  sample  of  the  desired 
input/output  behavior  of  some  program  and  pro¬ 
duce  (synthesize)  an  equivalent  program.  In  the 
testing  paradigm,  one  seeks  a  finite  sample  for  a 
function  such  that  any  program  (in  a  given  set) 
which  computes  something  other  than  the  object 
function  differs  from  the  object  Junction  on  the  finite 
sample.  In  both  cases,  the  finite  sample  embodies 
sufficient  knowledge  to  isolate  the  desired  program 
from  all  other  possibilities.  These  relationships  are 
investigated  and  general  recursion  theoretic 
properties  of  testable  sets  of  functions  are  exposed. 

[Budd82]  presents  similar  thewy  from  a  program¬ 
ming  language  theoretic  view.  [Rowland81]  pre¬ 
sents  related  theory  for  a  narrower  j^oblem  domain. 

This  paper  requires  a  strong  background  in  the  nota¬ 
tions  and  conventions  of  computability  theory.  For 
experts  only. 

Clarke76 

Garke,  Lori  A.  “A  System  to  Generate  Test  Data 
and  Symbolically  Execute  Programs.”  IEEE  Trans. 
Software  Eng.  SE-2,  3  (Sept.  1976),  215-222. 

Abstract:  This  paper  describes  a  .system  that  at¬ 
tempts  to  generate  test  data  for  programs  written  in 
ANSI  Fortran.  Given  a  path,  the  system  symboli- 
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cally  executes  the  path  and  creates  a  set  of  con¬ 
straints  on  the  program's  input  variables.  If  the  set 
of  constraints  is  linear,  linear  programming  tech¬ 
niques  are  employed  to  obtain  a  solution.  A  solu¬ 
tion  to  the  set  of  constraints  is  test  data  that  will 
drive  execution  down  the  given  path.  If  it  can  be 
determined  that  the  set  of  constraints  is  inconsis¬ 
tent,  then  the  given  path  is  shown  to  be  non¬ 
executable.  To  increase  the  chance  of  detecting 
some  of  the  more  common  programming  errors,  ar¬ 
tificial  constraints  are  temporarily  created  that  sim¬ 
ulate  error  conditions  and  then  an  attempt  is  made 
to  solve  each  augmented  set  of  constraints.  A  sym¬ 
bolic  representation  of  the  program's  output  vari¬ 
ables  in  terms  of  the  program's  input  variables  is 
also  created.  The  symbolic  represeraation  is  in  a 
human  readable  form  that  facilitates  error  detection 
as  well  as  being  a  possible  aid  in  assertion  gener¬ 
ation  and  automatic  program  documentation. 

This  paper  reports  on  an  early  symbolic  execution 
system  that  allows  a  user  to  specify  interactively  the 
path  to  be  analyzed.  The  use  of  artificial  constraints 
foreshadows  the  application  of  symbolic  execution 
in  fault-based  testing. 

A  symbolic  execution  system  for  a  simple  language 
makes  an  excellent  class  project.  This  paper  (along 
with  (Howden78b|  and  [Howden77])  provides  suf¬ 
ficient  detail  for  implementing  such  a  project. 

Clarke82 

Clarice,  Lori  A.,  Johnette  Hassell,  and  Debra 

J.  Richardson.  “A  Close  Look  at  Domain  Testing.” 

IEEE  Trans.  Software  Eng.  SE-8,  4  (July  1982), 
380-390. 

Abstract:  White  and  Cohen  have  proposed  the 
domain  testing  method,  which  attempts  to  uncover 
errors  in  a  path  domain  by  selecting  test  data  on 
and  near  the  boundary  of  the  path  domain.  The 
goal  of  domain  testing  is  to  demonstrate  that  the 
boundary  is  correct  within  an  acceptable  error 
bound.  Domain  testing  is  intuitively  appealing  in 
that  it  provides  a  method  for  satisfying  the  often 
suggested  guideline  that  boundary  conditions 
should  be  tested. 

In  addition  to  proposing  the  domain  testing  method. 
White  and  Cohen  have  developed  a  test  data  selec¬ 
tion  strategy,  which  attempts  to  satisfy  this  method. 
Further,  they  have  described  two  error  measures 
for  evaluating  domain  testing  strategies.  This 
paper  takes  a  close  look  at  their  strategy  and  their 
proposed  error  measures.  It  is  shown  that  in¬ 
ordinately  large  domain  errors  may  remain  un¬ 
detected  by  the  White  and  Cohen  strategy.  Two 
alternative  domain  testing  strategies,  which  im¬ 
prove  on  the  error  bound,  are  then  proposed  and 
the  complexity  of  each  of  the  three  strategies  Ls 
analyzed.  Finally,  several  other  issues  that  must  be 


addressed  by  domain  testing  are  presented  and  the 
general  applicability  of  this  method  is  discussed. 

This  paper  recommends  the  selection  of  additional 
test  points  to  narrow  the  range  of  domain  shifts  that 
remain  undetected  by  the  domain  testing  strategy 
suggested  in  [WhiteSO],  which  is  prerequisite  read¬ 
ing.  The  paper  makes  several  impotant  suggestions 
for  relaxing  the  resuictions  of  [WhiteSO]. 

This  is  essential  reading  for  the  instructor  if  domain 
testing  is  to  be  discussed.  Also,  it  serves  as  a  good 
source  of  thought  questions  for  examinations.  It  is 
advanced  reading  for  students. 

Clarke89 

Clarke,  Lori  A.,  Andy  Podgurski,  Debra  J.  Richard¬ 
son,  and  Steven  J.  Zeil.  “A  Formal  Evaluation  of 
Data  Flow  Path  Selection  Criteria.”  IEEE  Trans. 
Software  Eng.  15, 11  (Nov.  1989),  1318-1332. 

Abstract:  A  number  of  path  selection  criteria  have 
been  proposed  throughout  the  years.  Unfor¬ 
tunately,  little  work  has  been  done  on  comparing 
these  criteria.  To  determine  what  would  be  an  ef¬ 
fective  path  selection  criterion  for  revealing  faults 
in  programs,  we  have  uruiertaken  an  evaluation  of 
these  criteria.  This  paper  reports  on  the  results  of 
our  evaluation  of  path  selection  criteria  based  on 
data  flow  relationships.  We  show  how  these  crite¬ 
ria  relate  to  each  other,  thereby  demonstrating 
some  of  their  strengths  and  weaknesses.  In  addi¬ 
tion,  we  suggest  minor  changes  to  some  criteria  that 
improve  their  performance.  We  conclude  with  a 
discussion  of  the  major  limitations  of  these  criteria 
and  directions  for  future  research. 

The  authors  begin  with  a  thorough  overview  of  the 
three  principal  data  flow  approaches  to  path  selec¬ 
tion  (see  [Korel83],  [Ntafos84],  [Rapps85],  and 
[Ntafos88)),  demonstrating  the  interrelationships 
among  them  using  a  subsumption  hierarchy.  They 
then  suggest  modifications  to  the  path  selection  cri¬ 
teria  to  remedy  noted  deficiencies.  The  authors  dis¬ 
cuss  issues  other  than  subsumption  that  must  be 
considered  in  evaluating  the  criteria;  the  effect  of 
infeasible  paths,  the  relative  cost  of  the  criteria,  and 
the  fault  detection  capabilities  of  the  criteria. 

By  collecting  the  various  data  flow  definitions  in 
one  place,  the  authors  have  done  everyone  a  service. 

If  any  one  paper  on  data  flow  analysis  is  to  be 
studi^  by  the  instructor,  this  is  probably  the  appro¬ 
priate  one.  The  paper,  of  course,  contains  a  heavy 
dose  of  graph  theory. 

Davls88 

Davis.  Martin,  and  Elaine  J.  Weyuker.  “Metric 
Space-Ba.sed  Test-Data  Adequacy  Criteria.”  Com¬ 
puter  J.  31,  1  (Feb.  1988),  17-24. 
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Abstract:  Since  software  testing  cannot  ordinarily 
be  expected  to  provide  conclusive  evidence  that  a 
program  is  correct,  software  engineers  have  had  to 
be  satisfied  with  the  vague  notion  of  a  set  of  test 
data  being  adequate  for  a  given  program.  In  this 
paper  a  theoretical  model  is  provided  for  the  notion 
of  adequacy.  Adequacy  criteria  are  seen  as  serving 
to  distinguish  a  given  program  from  a  certain  class 
of  programs.  In  particular,  notions  of  distance  be¬ 
tween  programs  are  studied,  and  adequacy  of  a  test 
set  is  taken  to  mean  that  the  set  successfully  distin¬ 
guishes  the  program  being  tested  from  all  programs 
that  are  si^ciently  near  to  it,  and  differ  in  input- 
output  behaviour  from  the  given  program.  Certain 
points,  called  critical,  are  identified  which  must  oc¬ 
cur  in  every  adequate  test  set.  Finally,  lower 
bounds  are  obtained  on  the  size  of  test  sets  which 
are  minimally  adequate,  in  the  sense  that  they  have 
no  adequate  proper  subsets. 

The  concept  of  the  distance  between  two  programs 
is  defined  in  terms  of  transformations  needed  to 
convert  one  program  into  the  other.  The  adequacy 
discussed  is  with  respect  to  a  finite  neighbortiood 
(in  the  sense  of  [Budd82]),  limiting  the  usefulness  of 
the  results. 

The  authors  presume  a  strong  background  in  pro¬ 
gramming  language  theory.  This  paper  is  appro¬ 
priate  only  for  those  doing  research  into  the  th^ry 
of  fault-based  testing. 

DeMillo78a 

DeMillo,  Richard  A.,  Richard  J.  Lipton,  and 

Frederick  G.  Sayward.  “Hints  on  Test  Data  Selec¬ 
tion:  Help  for  the  Practicing  Programmer.” 

Computer  11,  4  (April  1978),  34-41.  Reprinted  in 
[MillerSI]. 

This  paper  should  win  a  prize  for  introducing  more 
catchy  new  terms  than  any  other — mutation  testing, 
competent  programmer  hypothesis,  coupling  effect. 
Beware!  It  is  easy  to  fdl  under  the  spell  of  the 
latter  two  terms  and  assume  they  are  well-defined 
and  justified.  Beware  also  of  the  typographical  er¬ 
ror  that  occurs  in  several  places  on  page  37,  where 
‘r  is  substituted  for  ‘I’.  This  substitution  leads  to 
the  (wrong)  impression  that  mutation  testing  is  per¬ 
formed  conventionally  on  double  mutants.  The  de¬ 
scription  on  page  39  is  confusing  and  seems  to  im¬ 
ply  that  14  mutants  are  equivalent  to  the  original, 
yet  four  of  them  are  not  [Duran81]  draws  exactly 
the  opposite  conclusion  bas^  on  the  random  gener¬ 
ation  example! 

Despite  Haws,  this  paper  is  an  excellent  introduction 
to  mutation  testing  and  is,  therefore,  essential  read¬ 
ing  for  both  instructor  and  student. 


DeMlllo78b 

DeMillo,  Richard  A.,  and  Richard  J.  Lipton.  “A 
Probabilistic  Remark  on  Algebraic  Program  Test¬ 
ing.”  Irtformation  Processing  Letters  7,  4  (June 
1978),  193-195. 

This  is  the  first  paper  to  introduce  the  concept  of 
determining  statistical  confidence  in  the  absence  of 
faults  in  programs  that  compute  functions  in  a  spec¬ 
ified  class  (e.g.,  polynomials). 

The  paper  is  in-depth  reading  for  the  instructor. 

DeMillo88 

DeMillo,  Richard  A.,  et  al.  “An  Extended  Overview 
of  the  Mothra  Software  Testing  Environment.”  Proc. 
Second  Workshop  on  Software  Testing,  Verification, 
and  Analysis.  Washington,  D.C.:  IEEE  Computer 
Society  Press,  1988, 142-151. 

Abstract:  Mothra  is  a  software  testing  environment 
that  supports  mutation-based  testing  of  software 
systems.  Mothra  is  interactive;  it  provides  a  high- 
bandwidth  user  interface  to  make  software  testing 
faster  and  less  pairful.  Mothra  currently  runs  on  a 
variety  of  systems  under  43BSD  UNIX,  UNIX  Sys¬ 
tem  V,  and  ULTRIX-32  1.2.  This  paper  begins  with 
a  brief  introduction  to  mutation  analysis.  We  then 
take  the  reader  on  a  guided  tour  of  Mothra,  empha¬ 
sizing  how  it  interacts  with  the  tester.  Then  we 
present  [sic]  with  a  short  discussion  of  Mothra’ s 
internal  design.  Next,  we  discuss  some  major  prob¬ 
lems  with  using  mutation  analysis  and  discuss  pos¬ 
sible  solutions.  We  conclude  by  presenting  a  solu¬ 
tion  to  one  of  these  problem  [sic] — a  new  method  of 
automatically  generating  mutation-adequate  test 
data. 

The  authors  present  an  overview  of  mutation  anal¬ 
ysis  and  its  problems,  and  they  describe  the  Mothra 
environment,  which  supports  mutation  testing.  A 
principal  problem  is  the  generation  of  mutation- 
adequate  test  data;  the  authors  discuss  heuristics  for 
the  generation  of  requisite  test  data. 

This  article  is  appropriate  for  the  student  interested 
in  mutation  operators  and  the  operation  of  a  muta¬ 
tion  system. 

Duncan81 

Duncan,  A.  G.,  and  J.  S.  Hutchinson.  “Using  Attri¬ 
buted  Grammars  to  Test  E)esigns  and  Implementa¬ 
tions.”  5th  Inti.  Corf,  on  Software  Eng.  New  York: 
IEEE,  March  1981,  170-178. 

Abstract:  We  present  a  method  for  generating  test 
cases  that  can  be  used  throughout  the  entire  Ife 
cycle  of  a  program.  This  method  uses  attributed 
translation  grammars  to  generate  both  inputs  and 
outputs,  which  can  then  be  used  either  as  is,  in  or¬ 
der  to  test  the  specfications,  or  in  conjunction  with 
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automatic  test  drivers  to  test  an  implementation 
against  the  specifications. 

The  grammar  can  generate  test  cases  either  ran¬ 
domly  or  systematically.  The  attributes  are  used  to 
guide  the  generation  process,  thereby  avoiding  the 
generation  of  many  superfluous  test  cases.  The 
grammar  itself  not  only  drives  the  generation  of  test 
cases  but  also  serves  as  a  concise  documentation  of 
the  test  plan. 

In  the  paper,  we  describe  the  test  case  generator, 
show  how  it  works  in  typical  examples,  compare  it 
with  related  techniques,  and  discuss  how  it  can  be 
used  in  conjunction  with  various  testing  heuristics. 

This  is  a  practical  paper  on  the  means  of  generating 
test  data  based  on  a  BNF  grammar.  The  use  of 
“attributes”  here  is  unconventional  and  is  not  direct¬ 
ly  related  to  attribute  grammars. 

This  paper  or  [Bazzichi82]  should  be  read  by  the 
instructor.  It  can  be  useful  for  in-depth  study  by  the 
student. 

DuranSO 

Duran,  Joe  W.,  and  John  J.  Wiorkowski.  “Quanti¬ 
fying  Software  Validity  by  Sampling.”  IEEE  Trans, 
on  Reliability  R-29,  2  (June  1980),  141-144. 

Abstract:  The  point  of  all  validation  techniques  is 
to  raise  assurance  about  the  program  under  study, 
but  no  current  methods  can  be  realistically  thought 
to  give  100%  assurance  that  a  validated  program 
will  perform  correctly.  There  are  currently  no  use¬ 
ful  ways  for  quantifying  how  'well-validated'  a  pro¬ 
gram  is.  One  measure  of  program  correctness  is 
the  proportion  of  elements  in  the  program's  input 
domain  for  which  it  fails  to  execute  correctly,  since 
the  proportion  is  zero  i.f.f.  the  program  is  correct. 
This  proportion  can  be  estimated  statistically  from 
the  results  of  program  tests  and  from  prior  subjec¬ 
tive  assessments  of  the  program's  correctness. 
Three  examples  are  presented  of  methods  for  deter¬ 
mining  s-corfidence  bounds  on  the  failure  propor¬ 
tion.  It  is  shown  that  there  are  reasonable  con¬ 
ditions  (for  programs  with  a  finite  number  of  paths) 
for  which  ensuring  the  testing  of  all  paths  does  not 
give  better  assurance  of  program  correctness. 

The  authors  are  interested  in  program  testing,  par¬ 
ticularly  in  quantifying  how  well  a  program  has 
been  tested.  Both  random  testing  and  path  testing 
are  considered.  A  strong  statistical  background  is 
presumed. 

This  is  expert  reading  for  the  instructor. 

DuranSI 

Duran,  Joe  W.,  and  John  J.  Wiorkowski.  “Capture- 
Recapture  Sampling  for  Estimating  Software  Error 
Content.”  IEEE  Trans.  Software  Eng.  SE-7,  1  (Jan. 
1981),  147-148. 


Abstract:  Mills'  capture-recapture  sampling  meth¬ 
od  allows  the  estimation  of  the  number  of  errors  in 
a  program  by  randomly  inserting  known  errors  and 
then  testing  the  program  for  both  inserted  and  in¬ 
digenous  errors.  This  correspondence  shows  how 
correct  corfidence  limits  and  maximum  likelihood 
estimates  can  be  obtained  from  the  test  results. 
Both  fixed  sample  size  testing  and  sequential  testing 
are  considered. 

It  is  essential  that  Mills’s  original  article  be  read 
first  (see  Chapter  9  of  [Mills83]).  A  strong  statistics 
background  is  needed. 

This  reading  is  for  experts  only. 

Duran84 

Duran,  Joe  W.,  and  Simeon  C.  Ntafos.  “An  Evalu¬ 
ation  of  Random  Testing.”  IEEE  Trans.  Software 
Eng.  SE-IO,  4  (July  1984),  438-444. 

Abstract:  Random  testing  of  programs  has  usually 
(but  not  always)  been  viewed  as  a  worst  case  of 
program  testing.  Testing  strategies  that  take  into 
account  the  program  structure  are  generally 
preferred.  Path  testing  is  an  often  proposed  ideal 
for  structural  testing.  Path  testing  is  treated  here 
as  an  instance  of  partition  testing,  where  by  par¬ 
tition  testing  is  meant  any  testing  scheme  which 
forces  execution  of  at  least  one  test  case  from  each 
subset  of  a  partition  of  the  input  domain.  Simula¬ 
tion  results  are  presented  which  suggest  that  ran¬ 
dom  testing  may  often  be  more  cost  ffiective  than 
partition  testing  schemes.  Also,  results  of  actual 
random  testing  experiments  are  presented  which 
confirm  the  viability  of  random  testing  as  a  useful 
validation  tool. 

This  paper  challenges  many  ideas  about  program 
testing,  especially  the  notion  that  random  testing  is 
of  no  value.  Experiments  were  conducted  to  vali¬ 
date  an  error  model,  and  the  structural  coverage  ac¬ 
complished  by  such  testing  is  reported.  Knowledge 
of  statistics  helps. 

This  is  essential  reading  for  the  instructor.  It  is 
challenging  for  the  student,  but  it  should  be  read. 

Foscilck76 

Fosdick,  Lloyd  D.,  and  Leon  J.  Osterweil.  “Data 
Flow  Analysis  in  Software  Reliability.”  ACM  Com¬ 
puting  Surveys  8,  3  (Sept.  1976),  305-330. 

Abstract:  The  ways  that  the  methods  of  data  flow 
analysis  can  be  applied  to  improve  software 
reliability  are  described.  There  is  also  a  review  of 
the  basic  terminology  from  grcqth  theory  and  from 
data  flow  analysis  in  global  program  optimization. 
The  notation  of  regular  expressions  is  used  to  de¬ 
scribe  actions  on  data  for  sets  of  paths.  These  ex¬ 
pressions  provide  the  basis  of  a  classification 
scheme  for  data  flow  which  represents  patterns  of 
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data  flow  along  paths  within  subprograms  and 
along  paths  which  cross  subprogram  boundaries. 
Fast  algorithms,  originally  introduced  for  global 
optimization,  are  described  and  it  is  shown  how 
they  can  be  used  to  implement  the  classification 
scheme.  It  is  then  shown  how  these  same  algo¬ 
rithms  can  also  be  used  to  detect  the  presence  of 
data  flow  anomalies  which  are  symptomatic  of  pro¬ 
gramming  errors.  Finally,  some  characteristics  of 
and  experience  with  DAVE,  a  data  flow  analysis 
system  embodying  some  of  these  ideas,  are  de¬ 
scribed. 

This  article  is  a  most  readable  and  thorough  intro¬ 
duction  to  data  flow  analysis.  Read  this  first  and 
compare  with  [Jachner84]. 

This  is  essential  reading  for  both  instructor  and  stu¬ 
dent 

FosterSO 

Foster,  Kenneth  A.  “Error  Sensitive  Test  Cases 

Analysis  (ESTCA).”  IEEE  Trans.  Software  Eng. 

SE-6,  3  (May  1980),  258-264. 

Abstract:  A  hardware  failure  analysis  technique 
adapted  to  software  yielded  three  rules  for  gener¬ 
ating  test  cases  sensitive  to  code  errors.  These 
rules,  and  a  procedure  for  generating  these  cases, 
are  given  with  examples.  Areas  for  further  study 
are  recommended. 

A  set  of  error-sensitive  test  case  analysis  rules  are 
given  for  producing  inputs  that  are  “error- 
sensitive.”  The  rules  are  ad  hoc,  and  no  theoretical 
justification  is  given  for  them.  Results  of  this  paper 
are  clarified  in  Software  Engineering  Notes  10,  1 
(Jan.  1985),  62-67. 

This  paper  contains  many  classical  examples  and  is 
useful  for  that  reason.  It  is  no'  essential  r^ing,  but 
it  raises  many  questions  about  why  the  proposed 
ideas  seems  to  work. 

Frankl88 

Frankl,  Riyllis  G.,  and  Elaine  J.  Weyuker.  “An  Ap¬ 
plicable  Family  of  Data  Flow  Testing  Criteria.” 

IEEE  Trans.  Software  Eng.  14,  10  (Oct.  1988), 
1483-1498. 

Abstract:  A  test  data  adequacy  criterion  ir  a  predi¬ 
cate  which  is  used  to  determine  whether  a  program 
has  been  tested  "enough."  An  adequacy  criterion 
is  applicable  if  for  every  program  there  exists  a  set 
of  test  data  for  the  program  which  satires  the  cri¬ 
terion.  Most  test  data  adequacy  criteria  based  on 
path  selection  fail  to  satisfy  the  applicability  prop¬ 
erty  because,  for  some  programs  with  unexecutable 
paths,  no  adequate  set  of  test  data  exists. 

In  this  paper,  we  extend  the  definitions  of  the 
previously  introduced  family  of  data  flow  testing 


criteria  to  apply  to  programs  written  in  a  large  sub¬ 
set  of  Pascal.  We  then  define  a  fondly  of  adequacy 
criteria  called  feasible  data  flow  testing  criteria, 
which  are  derived  from  the  data  flow  testing  crite¬ 
ria.  The  feasible  data  flow  testing  criteria  circum¬ 
vent  the  problem  of  nonapplicability  of  the  data 
flow  testing  criteria  by  requiring  the  test  data  to 
exercise  only  those  definition-use  associations 
which  are  executable.  We  show  that  there  are  sig¬ 
nificant  (Uffererues  between  the  relationships 
among  the  data  flow  testing  criteria  and  the 
relationships  among  the  feasible  data  flow  testing 
criteria. 

We  also  dneuss  a  generalized  notion  of  the  ex- 
ecutability  of  a  path  through  a  program  unit.  A 
script  of  a  testing  session  using  our  data  flow  test¬ 
ing  tool,  ASSET,  is  included  in  the  Appendix. 

The  emphasis  in  this  paper  is  on  the  term  “fea¬ 
sible.”  [Clarke89]  points  out  that  this  shift  in  con¬ 
cern  does  not  entirely  resolve  undecidable  issues.  It 
is  crucial  to  read  [Rapps85]  before  reading  this 
paper,  and  perhaps  [Clarke89]  as  well. 

This  paper  requires  significant  background  in  data 
flow  testing. 

Gannon81 

Gannon,  John,  Paul  R.  McMullin,  and  Richard 
G.  Hamlet.  “Data- Abstraction,  Implementation, 

Specification,  and  Testing.”  ACM  Trans.  Prog. 
Lang.  andSyst.  3, 3  (July  1981),  211-223. 

Abstract:  A  compiler-based  system  DAISTS  that 
combines  a  data-abstraction  language  (derived 
from  the  SIMULA  class)  with  spec^ication  by  al¬ 
gebraic  axioms  is  described.  The  compiler, 
presented  with  two  independent  syruactic  objects  in 
the  axioms  and  implementing  code,  compiles  a 
"program"  that  consists  of  the  former  as  test  driver 
for  the  latter.  Data  points,  in  the  form  of  expres¬ 
sions  using  the  abstract  functions  and  constant 
values,  are  fed  to  this  program  to  determine  if  the 
implementation  and  axioms  agree.  Along  the  way, 
structural  testing  measures  can  be  applied  to  both 
code  and  axioms  to  evaluate  the  test  data.  Although 
a  successful  test  does  not  conclusively  demonstrate 
the  consistency  of  axioms  and  code,  in  practice  the 
tests  are  seldom  successful,  revealing  errors.  The 
advantage  over  conventional  programming  systems 
is  threefold: 

(1)  The  presence  of  the  axioms  eliminates  the  need 
for  a  test  oracle:  only  inputs  need  be  supplied. 

(2)  Testing  is  automated:  a  user  writes  axioms, 
implemeruation,  and  test  points;  the  .system  writes 
the  test  drivers. 

(3)  The  results  of  tests  are  often  surprising  and 
helpful  because  it  is  difficult  to  get  away  with 
"trivial"  te.sts:  what  is  not  significant  for  the  code  is 
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liable  to  be  a  severe  test  of  the  axioms,  and  vice 
versa. 

The  system  described  here  covers  diverse  aspects  of 
program  testing.  It  is  a  specification-dependent 
hybrid  approach  that  takes  advantage  of  the  or¬ 
thogonality  between  implementations  and  algebraic 
axioms. 

This  paper  is  recommended  reading  for  the  instruc¬ 
tor.  With  some  background  in  algebraic  specifi¬ 
cation,  students  can  readily  comprehend  the  system. 

Gerhart76 

Gertiart,  Susan  L.,  and  Lawrence  Yelowitz.  “Obser¬ 
vations  of  Fallibility  in  Applications  of  Modem  Pro¬ 
gramming  Methodologies.”  IEEE  Trans.  Software 
Eng.  SE-2,  3  (SepL  1976).  195-207. 

Abstract:  Errors,  inconsistencies,  or  confusing 
points  are  noted  in  a  variety  of  published  algo¬ 
rithms,  many  of  which  are  being  used  as  examples 
in  formulating  or  teaching  principles  of  such 
modern  programming  methodologies  as  formal 
specification,  systematic  construction,  and  correct¬ 
ness  proving.  Common  properties  of  these  points  of 
contention  are  abstracted.  These  properties  are 
then  used  to  pinpoint  possible  causes  of  the  errors 
and  to  formulate  general  guidelines  which  might 
help  to  avoid  further  errors.  The  common  charac¬ 
teristic  of  mathematical  rigor  and  reasoning  in 
these  examples  is  noted,  leading  to  some  discussion 
about  fallibility  in  mathematics,  and  its  relationship 
to  fallibility  in  these  programming  methodologies. 
The  overriding  goal  is  to  cast  a  more  realistic  per¬ 
spective  on  the  methodologies,  particularly  con¬ 
structive  recommendations  for  their  improvement. 

This  paper  is  a  masterpiece  of  analysis  of  how  er¬ 
rors  occur  in  the  life  cycle.  Though  the  authors 
“nit-pick”  in  places,  they  succeed  in  convincing  the 
most  adamant  skeptic  of  the  need  for  dynamic  test¬ 
ing  of  computer  programs.  The  paper  is  best  under¬ 
stood  after  some  formal  specifications  and  proofs  of 
correcmess  are  attempted. 

This  paper  is  essential  reading  for  both  instructor 
and  student 

Goodenough75 

Goodenough,  John  B.,  and  Susan  L.  Gerhart. 
“Toward  a  Theory  of  Test  Data  Selection.”  IEEE 
Trans.  Software  Eng.  SE-1,  2  (June  1975),  156-173. 
Reprinted  in  [Miller81]. 

Abstract:  This  paper  examines  the  theoretical  and 
practical  role  of  testing  in  software  development. 

We  prove  a  fundamental  theorem  showing  that 
properly  structured  tests  are  capable  of  demonstrat¬ 
ing  the  absence  of  errors  in  a  program.  The 
theorem's  proof  hinges  on  our  definition  of  test 


reliability  and  validity,  but  its  practical  utility 
hinges  on  being  able  to  show  when  a  test  is  actually 
reliable.  We  explain  what  makes  tests  unreliable 
(for  example,  we  show  by  example  why  testing  all 
program  statements,  predicates,  or  paths  is  not 
usually  sufficient  to  insure  test  reliability),  and  we 
outline  a  possible  approach  to  developing  reliable 
tests.  We  also  show  how  the  analysis  required  to 
define  reliable  tests  can  help  in  checking  a 
program's  design  and  specifications  as  well  as  in 
preventing  and  detecting  implementation  errors. 

Despite  the  flaws  indicated  in  [WeyukerSO],  this 
paper  remains  a  classic.  It  is  essential  reading  for 
the  instructor.  Students  find  it  very  difficult;  do  not 
use  it  as  an  introduction  to  testing! 

GourlaySS 

Gourlay,  John  S.  “A  Mathematical  Framework  for 
the  Investigation  of  Testing.”  IEEE  Trans.  Software 
Eng.  SE-9.  6  (Nov.  1983),  686-709. 

Abstract:  Testing  has  long  been  in  need  of  math¬ 
ematical  underpinnings  to  explain  its  value  as  well 
as  its  limitations.  This  paper  develops  and  applies 
a  mathematical  framework  that  1)  unifies  previous 
work  on  the  subject,  2)  provides  a  mechanism  for 
comparing  the  power  of  methods  of  testing  pro¬ 
grams  based  on  the  degree  to  which  the  methods 
approximate  program  verification,  and  3)  provides 
a  reasoruible  and  useful  interpretation  of  the  notion 
that  successful  tests  increase  one’s  corfidence  in 
the  program's  correctness. 

Applications  of  the  framework  include  corfirmation 
of  a  number  of  common  assumptions  about  prac¬ 
tical  testing  methods.  Among  the  assumptions  con¬ 
firmed  is  the  need  for  generating  tests  from  specifi¬ 
cations  as  well  as  programs.  On  the  other  hand,  a 
careful  formal  analysis  shows  that  the  "competent 
programmer  hypothesis"  does  not  suffice  to  ensure 
the  claimed  tugh  reliability  of  mutation  testing. 
Hardware  testing  is  shown  to  fit  into  the  framework 
as  well,  and  a  brief  consideration  of  it  shows  how 
the  practical  differences  between  it  and  software 
testing  arise. 

This  paper  is  expert  reading. 

Hamlet77a 

Hamlet,  Richard  G.  ‘Testing  Programs  with  Finite 
Sets  of  Data.”  Computer  J.  20,  3  (Aug.  1977), 
232-237. 

Abstract:  The  techniques  of  compiler  optimization 
can  be  applied  to  aid  a  programmer  in  writing  a 
program  which  cannot  be  improved  by  these  tech¬ 
niques.  A  finite,  representative  set  of  test  data  can 
be  useful  in  this  process.  This  paper  presents  the 
theoretical  basis  for  the  (nonconstructive)  existence 
of  test  sets  which  serves  as  maximally  effective 
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stand-ins  for  an  unlimited  number  of  input  pos¬ 
sibilities.  It  is  argued  that  although  the  time  re¬ 
quired  by  a  compiler  to  f idly  exercise  a  program  on 
a  set  of  data  may  be  large,  the  corresponding  im¬ 
provement  in  the  reliability  of  the  program  may  also 
be  large  the  set  meets  the  given  theoretical  re¬ 
quirements. 

As  a  theoretical  companion  to  [Hamlet77b],  this 
paper  explores  the  notion  of  assessing  test  data  ade¬ 
quacy  via  program  mutations.  The  article  requires 
some  background  in  computability,  especially  in  re¬ 
duction  proofs  involving  the  halting  problem. 

The  paper  could  be  used  as  an  introduction  to  com¬ 
putability  for  students  with  limited  background;  all 
its  theorems  are  relevant  to  issues  involved  in  pro¬ 
gram  testing. 

HamietTTb 

Hamlet,  Richard  G.  ‘Testing  Programs  with  the  Aid 

of  a  Compiler.”  IEEE  Trans.  Software  Eng.  SE-3,  4 
(July  1977),  279-290. 

Abstract:  If  finite  input-output  specifications  are 
added  to  the  syntax  of  programs,  these  specifica¬ 
tions  can  be  verified  at  compile  time.  Programs 
which  carry  adequate  tests  with  them  in  this  way 
should  be  resistant  to  maintenance  errors.  If  the 
specifications  are  independent  of  program  details 
they  are  easy  to  give,  and  unlikely  to  contain  errors 
in  common  with  the  program.  Furthermore,  certain 
finite  specifications  are  maximal  in  that  they  ex¬ 
ercise  the  control  and  expression  structure  of  a  pro¬ 
gram  as  well  as  any  tests  can. 

A  testing  system  based  on  a  compiler  is  described, 
in  which  compiled  code  is  utilized  under  interactive 
control,  but  "semantic"  errors  are  reported  in  the 
style  of  conventional  syntax  errors.  The  implemen¬ 
tation  is  entirely  in  the  high-level  language  on 
which  the  system  is  based,  using  some  novel  ideas 
for  improving  documentation  without  sacrificing  ef¬ 
ficiency. 

This  paper  provides  an  excellent  description  of  a 
system  that  anticipated  many  of  the  fault-based 
methods  of  program  testing  practice  and  theory.  It 
represents  the  fnst  fault-based  system  using  pro¬ 
gram  mutation  in  a  context  that  determines  test  data 
adequacy  by  demonstrating  that  no  simpler  pro¬ 
grams  can  be  substituted  for  the  original  and  still 
pass  the  test. 

The  paper  is  easy  to  understand  and  motivates  dis¬ 
cussion  of  mutation  testing  and  test  data  adequacy. 

It  is  recommended  reading  for  the  instructor;  for  the 
student,  it  provides  an  interesting  comparison  of 
tradeoffs  among  mutation  methods. 


Hamlet87 

Hamlet,  Richard  G.  “Probable  Correctness  Theory.” 

Information  Processing  Letters  25,  1  (April  1987), 

17-25. 

Abstract:  A  theory  of  'probable  correctness'  is  pro¬ 
posed  to  assess  the  reliability  of  software  through 
testing.  Current  research  in  testing  is  not  adequate 
for  this  assessment.  Most  testing  methods  are  in¬ 
tended  for  debugging,  to  find  failures  and  connect 
them  to  program  faults  for  repair.  When  these 
methods  no  longer  expose  errors,  no  analysis  has 
been  done  to  find  the  confidence  that  may  be  placed 
in  the  software.  (Preliminary  results  here  are  that 
this  confidence  should  be  low.)  Other  work  applies 
conventional  decision  theory  to  inputs  as  samples  of 
a  program's  use.  The  application  is  suspect  be¬ 
cause  the  necessary  independence  and  distribution 
assumptions  may  be  violated;  in  any  case,  the 
results  are  intuitively  incorrect.  The  proposed  the¬ 
ory  relies  on  a  uniform  distribution  of  test  samples, 
but  relates  these  to  textually  occurring  faults. 
Preliminary  results  include  an  analysis  of  partition 
testing,  and  suggestions  for  textual  sampling.  It  is 
crucial  that  any  such  corfidence  theory  be  plau¬ 
sible,  so  the  foundations  of  program  sampling  are 
examined  in  detail. 

This  paper  lays  the  foundation  for  a  new  area  of 
investigation  in  program  testing.  Probable  correct¬ 
ness  thecvy  estimates  the  inobability  that  a  program 
has  no  faults.  (Reliability  theory,  on  the  other  hand, 
tries  to  bound  the  probability  that  a  program  will 
fail.)  This  theory  provides  a  means  of  computing 
bounds  on  the  trustworthiness  of  software.  The  the¬ 
ory  is  improved  in  [Hamlet90]. 

Understanding  the  probability  model  developed 
here  requires  a  significant  investment  on  the  part  of 
the  reader.  It  explores  different  sample  spaces  in 
which  faults  may  lie. 

This  paper  is  essential  reading  for  the  instructor 
who  wants  to  discuss  the  statistical  confidence  that 
can  be  associated  with  a  successful  test 

Hamlet88 

Hamlet,  Richard  G.  “Special  Section  on  Software 

Testing.”  Comm.  ACM  31,  6  (June  1988),  662-667. 

Hamlet  provides  an  overview  of  three  papers  on 
testing.  He  discusses  difficulties  encountered  in  try¬ 
ing  to  infer  statistical  confidence  based  upon  test 
results. 

Hamlet90 

Hamlet,  Richard  G..  and  Ross  Taylor.  “Partition 

Testing  Does  Not  In.spire  ConfideiKe.”  IEEE  Trans. 

Software  Eng.  16, 12  (Dec.  1990),  1402-1411. 

Abstract:  Partition  testing,  in  which  a  program's 
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input  domain  is  divided  according  to  some  rule  and 
tests  conducted  within  the  subdomains,  enjoys  a 
good  reputation.  However,  comparison  between 
testing  that  observes  subdomain  boundaries  and 
random  sampling  that  ignores  the  partition  gives 
the  counterintuitive  result  that  partitioning  is  of  lit¬ 
tle  value.  In  this  paper  we  improve  the  negative 
results  published  about  partition  testing,  and  try  to 
reconcile  them  with  its  intuitive  value.  Theoretical 
models  allow  us  to  study  partition  testing  in  the  ab¬ 
stract,  and  to  describe  the  circumstances  under 
which  it  should  perform  well  at  failure  detection. 
Partition  testing  is  shown  to  be  more  valuable  when 
the  partitions  are  narrowly  based  on  expected 
failures  and  there  is  a  good  chance  that  failures 
occur.  For  gaining  confidence  from  succes^ul 
tests,  partition  testing  as  usually  practiced  has  little 
value. 

By  “confidence,”  the  author  means  statistical  con¬ 
fidence.  The  paper  challenges  many  long-held 
beliefs  about  the  value  of  partition  testing,  in  partic¬ 
ular,  its  value  in  ensuring  overall  confidence  in  the 
correct  operation  of  the  program.  Hamlet  reviews 
and  extends  the  work  of  Duran  and  Ntafos 
[Duran84]  on  random  testing  on  a  failure-rate  model 
of  program  confidence  and  presents  his  own  defect- 
rate  model  of  probable  correcmess  for  assessing 
partition  testing. 

This  paper  requires  some  background  in  probability 
and  statistics.  It  may  be  necessary  to  read  some  of 
the  cited  references  before  fully  comprehending  the 
material. 

Hantler76 

Hantler,  Sidney  L.,  and  James  C.  King.  “An  Intro¬ 
duction  to  Proving  the  Correctness  of  Programs.” 

ACM  Computing  Surveys  8,  3  (Sept.  1976),  331-353. 

Reprinted  in  [MillerBI]. 

Abstract:  This  paper  explains,  in  an  introductory 
fashion,  the  method  of  specifying  the  correct  be¬ 
havior  of  a  program  by  the  use  of  inputloiaput 
assertions  and  describes  one  method  for  showing 
that  the  program  is  correct  with  respect  to  those 
assertions.  An  initial  assertion  characterizes  con¬ 
ditions  expected  to  be  true  upon  entry  to  the  pro¬ 
gram  and  a  final  assertion  characterizes  conditions 
expected  to  be  true  upon  exit  from  the  program. 
When  a  program  contains  no  branches,  a  technique 
known  as  symbolic  execution  can  be  used  to  show 
that  the  truth  of  the  initial  assertion  upon  entry 
guarantees  the  truth  of  the  final  assertion  upon  exit. 
More  generally,  for  a  program  with  branches  one 
can  define  a  symbolic  execution  tree.  If  there  is  an 
upper  bound  on  the  number  of  times  each  loop  in 
such  a  program  may  be  executed,  a  proof  of  cor¬ 
rectness  can  be  given  by  a  simple  traversal  of  the 
(finite)  symbolic  execution  tree. 


However,  for  most  programs,  no  fixed  bound  on  the 
number  of  times  each  loop  is  executed  exists  and  the 
corresponding  symbolic  execution  trees  are  irfinite. 

In  order  to  prove  the  correctness  of  such  programs, 
a  more  general  assertion  structure  must  be  pro¬ 
vided.  The  symbolic  execution  tree  of  such  pro¬ 
grams  must  be  traversed  inductively  rather  than  ex¬ 
plicitly.  This  leads  naturally  to  the  use  of  addi¬ 
tional  assertions  which  are  called  "inductive 
assertions.  ” 

This  highly  readable  article  provides  a  gentle  intro¬ 
duction  to  three  important  areas:  program  correct¬ 
ness,  formal  verification,  and  symbolic  execution. 

The  insuiictor  who  needs  to  leam  about  the  relation¬ 
ship  of  symbolic  execution  to  verification  should 
begin  here.  Additional  references  can  be  found  in 
[Berztiss88].  This  is  ideal  reading  for  students. 

Hayes-Roth83 

Hayes-Roth,  Frederick,  and  Donald  Arthur  Water¬ 
man,  eds.  Building  Expert  Systems.  Reading, 
Mass.;  Addison- Wesley,  1983. 

This  book  outlines  verification  activities  applicable 
to  expert  systems. 

HechtT? 

Hecht,  Matthew  S.  Flow  Analysis  of  Computer 
Programs.  New  York:  Elsevier  North-HoIIand, 
1977. 

This  is  the  standard  text  covering  the  theory  of  data 
flow  analysis  as  applied  to  program  optimization. 
Application  of  data  flow  analysis  to  verification  is 
not  covered. 

HowdenTS 

Howden,  William  E.  “Methodology  for  the  Gener¬ 
ation  of  Program  Test  Data.”  IEEE  Trans.  Com¬ 
puters  C-24, 5  (May  1975),  554-560. 

Abstract:  A  methodology  for  generating  program 
test  data  is  described.  The  methodology  is  a  model 
of  the  test  data  generation  process  and  can  be  used 
to  characterize  the  basic  problems  of  test  data  gen¬ 
eration.  It  is  well  defined  and  can  be  used  to  build 
an  automatic  test  data  generation  system. 

The  methodology  decomposes  a  program  into  a 
finite  set  of  classes  of  paths  in  such  a  way  that  an 
intuitively  complete  set  of  test  cases  would  cause  the 
execution  of  one  path  in  each  class.  The  test  data 
generation  problem  is  theoretically  un.wlvable: 
there  is  no  algorithm  which,  given  any  class  of 
paths,  will  either  generate  a  test  case  that  causes 
some  path  in  that  class  to  be  followed  or  determine 
that  no  such  data  exist.  The  methodology  attempts 
to  generate  test  data  for  as  many  of  the  classes  of 
paths  as  possible.  It  operates  by  con-  rue  ting 
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descriptions  of  the  input  data  subsets  which  cause 
the  classes  of  paths  to  be  followed.  It  transforms 
these  descriptions  into  systems  of  predicates  which 
it  attempts  to  solve. 

This  paper  contains  a  nuts-and-bolts  presentation  of 
symbolic  execution  techniques. 

The  instructor  may  find  this  paper  useful,  but  dated. 
Students  who  are  not  enthusiastic  about  using  struc¬ 
tural  coverage  to  generate  test  data  should  avoid  this 
one. 

Howden76 

Howden,  William  E.  "Reliability  of  the  Path  Anal¬ 
ysis  Testing  Strategy.”  IEEE  Trans.  Software  Eng. 
SE-2,  3  (Sept.  1976),  208-215.  Reprinted  in 
[Miller81]. 

Abstract:  A  set  of  test  data  T  for  a  program  P  is 
reliable  if  it  reveals  that  P  contains  an  error  when¬ 
ever  P  is  incorrect.  If  a  set  of  tests  T  is  reliable  and 
P  produces  the  correct  output  for  each  element  ofT 
then  P  is  a  correct  program.  Test  data  generation 
strategies  are  procedures  for  generating  sets  of  test 
data.  A  testing  strategy  is  reliable  for  a  program  P 
if  it  produces  a  reliable  set  of  test  data  for  P.  It  is 
proved  that  an  effective  testing  strategy  which  is 
reliable  for  edl  programs  cannot  be  constructed.  A 
description  of  the  path  analysis  testing  strategy  is 
presented.  In  the  path  analysis  strategy  data  are 
generated  which  cause  different  paths  in  a  program 
to  be  executed.  A  method  for  analyzing  the 
reliability  of  path  testing  is  introduced.  The  method 
is  used  to  characterize  certain  classes  of  programs 
and  program  errors  for  which  the  path  analysis 
strategy  is  reliable.  Examples  of  published  incor¬ 
rect  programs  are  included. 

This  is  an  excellent  paper,  which  established  much 
of  the  terminology  and  influenced  much  of  the  work 
in  path  testing. 

This  is  essential  reading  for  both  the  instructor  and 
student. 

HowdenTT^ 

Howden,  William  E.  “Symbolic  Testing  and  the 
DT‘At>ECr  Symbolic  Evaluation  System.”  IEEE 
Trans.  Software  Eng.  SE-3,  4  (July  1977),  266-278. 
Reprinted  in  [MillerSI]. 

Abstract:  Symbolic  testing  and  a  symbolic  evalu¬ 
ation  system  called  DISSECT  are  described.  The 
principle  features  of  DISSECT  are  outlined.  The 
results  of  two  classes  of  experiments  in  the  use  of 
symbolic  evaluation  are  summarized.  Several 
classes  of  program  errors  are  defined  and  the 
reliability  of  symbolic  testing  in  finding  bugs  is  re¬ 
lated  to  the  classes  of  errors.  The  relationship  of 
symbolic  evaluation  systems  like  DISSECT  to 


classes  of  program  errors  and  to  other  kinds  of  pro¬ 
gram  testing  and  program  analysis  tools  is  also  dis¬ 
cussed.  Desirable  improvements  in  DISSECT, 
whose  importance  was  revealed  by  the  experiments, 
are  mentioned. 

This  paper  provides  a  detailed  look  into  the 
streng^s  and  weaknesses  of  a  symbolic  execution 
system.  Several  interesting  notions  are  introduced, 
such  as  using  two-dimensional  output  to  improve 
readability  of  symbolic  output  and  using  a  path  de¬ 
scription  language.  For  mcxe  detailed  information 
on  the  DISSECT  system,  see  (HowdGn78b]. 

The  paper  is  necessary  only  for  in-depth  under¬ 
standing  of  symbolic  execution.  It  is  easily  under¬ 
stood  by  students. 

Howden78a 

Howden,  William  E.  “Theoretical  and  Empirical 
Studies  of  Program  Testing.”  IEEE  Trans.  Software 
Eng.  SE-4, 4  (July  1978),  293-298. 

Abstract:  Two  approaches  to  the  study  of  program 
testing  are  described.  One  approach  is  theoretical 
and  the  other  empirical.  In  the  theoretical  ap¬ 
proach  situations  are  characterized  in  which  it  is 
possible  to  use  testing  to  formally  prove  the  correct¬ 
ness  of  programs  or  the  correctness  of  properties  of 
programs.  In  the  empirical  approach  testing  strate¬ 
gies  reveal  the  errors  in  a  collection  of  programs. 

A  summary  of  the  results  of  two  research  projects 
which  investigated  these  approaches  are  presented. 
The  (Ufferences  between  the  two  approaches  are 
discussed  and  their  relative  advantages  and  dis¬ 
advantages  are  compared. 

This  paper  is  recommended  reading  for  the  instruc¬ 
tor  who  wishes  to  compare  the  theoretical  approach 
with  the  empirical  approach.  It  is  readily  under¬ 
stood  by  students. 

Howden78b 

Howden,  William  E.  “DISSECT — A  Symbolic  Eval¬ 
uation  and  Program  Testing  System.”  IEEE  Trans. 
r'TftwareEng.  SE-4,  1  (Jan.  1978),  70-73. 

Abstract:  The  basic  features  of  the  DISSECT  sym¬ 
bolic  testing  tool  are  described.  Usage  procedures 
are  outlined  and  the  special  advantages  of  the  tool 
are  summarized.  Co.\t  estimates  for  using  the  tool 
are  provided  and  the  results  of  experiments  to  de¬ 
termine  its  effectiveness  are  included.  The  back¬ 
ground  and  history  of  the  development  of  the  tool 
are  outlined.  The  availability  of  the  tool  is  de¬ 
scribed  and  a  listing  of  reference  materials  is  in¬ 
cluded. 

This  paper  provides  detailed  information  in  the  use 
of  a  batch-oriented  symbolic  execution  system.  For 
a  broader  perspective,  see  [Howden77]  and  [Much- 
nick81]. 
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The  paper  is  necessary  only  for  in-depth  under¬ 
standing  of  symbolic  execution.  It  should  be  read 
with  ICIarke76]. 

Howden78c 

Howden,  William  E.  “Algebraic  Program  Testing.” 

Acta  Inf ormatica  10, 1  (1978),  53-66. 

Abstract:  An  approach  to  the  study  of  program 
testing  is  iruroduced  in  which  program  testing  is 
treated  as  a  special  kind  of  equivalence  problem.  In 
this  approach,  classes  of  programs  P*  and  associ¬ 
ated  classes  of  test  sets  T*  are  defined  which  have 
the  property  that  if  two  programs  P  and  Q  in  P* 
agree  on  a  set  of  tests  from  T*.  then  P  and  Q  are 
computationally  equivalent.  The  properties  of  a 
class  P*  and  the  associated  class  T*  can  be  thought 
of  as  defining  a  set  of  assumptions  about  a 
hypothetical  correct  version  Q  of  a  program  P  in 
P*.  If  the  assumptions  are  valid  then  it  is  possible 
to  prove  the  correctness  ofP  by  testing.  The  main 
result  of  the  paper  is  an  equivalence  theorem  for 
classes  of  programs  which  carry  out  sequences  of 
computations  involving  the  elements  of  arrays. 

This  reading  is  for  expert  knowledge. 

HowdenSOa 

Howden,  William  E.  “Applicability  of  Software  Val¬ 
idation  Techniques  to  Scientific  Programs.”  ACM 

Trans.  Prog.  Lang,  and  Syst.  2,  3  (July  1980),  307- 
320.  Reprinted  in  [Miller81]. 

Abstract:  Error  analysis  involves  the  examination 
of  a  collection  of  programs  whose  errors  are 
known.  Each  error  is  analyzed  and  validation  tech¬ 
niques  which  would  discover  the  error  are  identi¬ 
fied.  The  errors  that  were  present  in  version  five  of 
a  package  of  Fortran  scientific  subroutines  and 
then  later  corrected  in  version  six  were  analyzed. 

An  integrated  collection  of  static  and  dynamic  anal¬ 
ysis  methods  would  have  discovered  the  error  in 
version  five  before  its  release.  An  integrated  ap¬ 
proach  to  validation  and  the  effectiveness  of  indi¬ 
vidual  methods  are  discussed. 

The  author  gives  an  excellent  description  of  what 
errors  are  discovered  by  what  techniques. 

This  paper  is  essential  reading  for  the  instructor  and 
student  alike. 

HowdenSOb 

Howden,  William  E.  “Functional  Program  Testing.” 

IEEE  Trans.  Software  Eng.  SE-6,  2  (March  1980), 

162-169. 

Abstract:  An  approach  to  functional  testing  is  de¬ 
scribed  in  which  the  design  of  a  program  is  viewed 
as  an  integrated  collection  of  functions.  The  selec¬ 
tion  of  test  data  depends  on  the  functions  used  in  the 


design  and  on  the  value  spaces  over  which  the  Junc¬ 
tions  are  defined.  The  basic  ideas  on  the  method 
were  developed  during  the  study  of  a  collection  of 
scientific  programs  containing  errors.  The  method 
was  the  most  reliable  testing  technique  for  discover¬ 
ing  the  errors.  It  was  found  to  be  sign^icantly  more 
reliable  than  structural  testing.  The  two  techniques 
are  compared  and  their  relative  advantages  and 
limitations  are  discussed. 

By  “functional  program  testing,”  Howden  means 
testing  those  aspects  of  a  program  that  have  any 
form  of  external  specification,  including  design 
documents  or  even  comments  within  the  code. 

This  paper  is  a  precursor  of  [Howden86]. 

Howden82 

Howden,  William  E.  “Weak  Mutation  Testing  and 

Completeness  of  Test  Sets.”  IEEE  Trans.  Software 
Eng.  SE-8, 4  (July  1982),  371-379. 

Abstract:  Different  approaches  to  the  generation  of 
test  data  are  described.  Error-based  approaches 
depend  on  the  definition  of  classes  of  commonly  oc¬ 
curring  program  errors.  They  generate  tests  which 
are  specifically  designed  to  ^termine  if  particular 
classes  of  errors  occur  in  a  program.  An  error- 
based  method  called  weak  mutation  testing  is  de¬ 
scribed.  In  this  method,  tests  are  constructed  which 
are  guaranteed  to  force  program  statements  which 
contain  certain  classes  of  errors  to  act  incorrectly 
during  the  execution  of  the  program  over  those 
tests.  The  method  is  systematic,  and  a  tool  can  be 
built  to  help  the  user  apply  the  method.  It  is  exten¬ 
sible  in  the  sense  that  it  can  be  extended  to  cover 
additional  classes  of  errors.  Its  relationship  to 
other  software  testing  methods  is  discussed.  Ex¬ 
amples  are  included. 

Different  approaches  to  testing  involve  different 
concepts  of  the  adequacy  or  completeness  of  a  set 
of  tests.  A  formalism  for  characterizing  the  com¬ 
pleteness  of  test  sets  that  are  generated  by  error- 
based  methods  such  as  weak  mutation  testing  as 
well  as  the  test  sets  generated  by  other  testing  meth¬ 
ods  is  introduced.  Error-based,  functional,  and 
structural  testing  emphasize  different  approaches  to 
the  test  data  generation  problem.  The  formalism 
which  is  introduced  in  the  paper  can  be  u.’sed  to 
describe  their  common  baus  and  their  differences. 

Weak  mutation  testing  provides  a  viable  alternative 
to  its  more  expensive  cousin,  mutation  testing,  and 
bears  close  resemblance  to  the  system  described  in 
[Hamlet77a].  This  paper  formalizes  the  notion  of 
completeness  of  a  test  set  based  on  its  ability  to 
detect  local  changes  to  the  code.  A  good  com¬ 
parison  of  testing  methods  is  made,  using  notation 
introduced  in  the  paper.  The  paper  is  more  easily 
understood  if  [DeMillo78],  [WhiteSO],  and  [Foster80) 
are  read  first. 
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Tljis  paper  is  recommended  for  the  instructor,  espe¬ 
cially  if  error-based  or  fault-based  testing  is  to  be 
covered  in  depth.  Given  sufficient  background,  stu¬ 
dents  should  find  the  paper  accessible.  It  could 
form  the  basis  of  a  class  project  to  develop  a  weak 
mutation  system. 

Howden86 

Howden,  William  E.  “A  Functional  Approach  to 

Program  Testing  and  Analysis.”  IEEE  Trans.  Soft¬ 
ware  Eng.  SE-12, 10  (Oct.  1986),  997-1005. 

Abstract:  An  integrated  approach  to  testing  is  de¬ 
scribed  which  includes  both  static  and  dynamic 
analysis  methods  and  which  is  based  on  theoretical 
results  that  prove  both  its  effectiveness  and  efficien¬ 
cy.  Programs  are  viewed  as  consisting  of  collec¬ 
tions  of  functions  that  are  joined  together  using 
elementary  functional  forms  or  complex  functional 
structures. 

Functional  testing  is  identified  as  the  input-output 
analysis  of  functional  forms.  Classes  of  faults  are 
defined  for  these  forms  and  results  presented  which 
prove  the  fault  revealing  effectiveness  of  well  de¬ 
fined  sets  of  tests. 

Functional  analysis  is  identified  as  the  analysis  of 
the  sequences  of  operators,  fimctions,  and  data  type 
transformations  which  occur  in  functional  struc¬ 
tures.  Functional  trace  analysis  involves  the  ex¬ 
amination  of  the  sequences  of  function  calls  which 
occur  in  a  program  path;  operator  sequence  anal¬ 
ysis  the  examination  of  the  sequences  of  operators 
on  variables,  data  structures,  and  devices;  and  data 
type  transformation  analysis  the  examination  of  the 
sequences  of  transformations  on  data  types.  Theo¬ 
retical  results  are  presented  which  prove  that  it  is 
only  necessary  to  look  at  interfaces  between  pairs  of 
operators  and  data  type  tranrformations  in  order  to 
detect  the  presence  of  operator  or  data  type  se¬ 
quencing  errors.  The  results  depend  on  the  defini¬ 
tion  of  normal  forms  for  operator  and  data  type 
sequencing  diagrams. 

This  paper  represents  the  culmination  of  the  devel¬ 
opment  of  Howden’s  ideas  on  program  testing  as  a 
full-blown  theory.  It  summarizes  his  book 
[HowdenST]  and  should  be  consulted  before  select¬ 
ing  the  book  in  a  course.  By  an  interesting  twist  of 
terminology,  Howden  has  managed  to  incorporate 
all  of  structural  testing  into  functional  testing.  He 
presumes  the  availability  of  external  functions  that 
specify  the  behavior  of  components  of  the  program, 
even  those  as  small  as  an  expression.  Thus,  con¬ 
ventional  structural  issues  such  as  branch  testing  are 
converted  into  questions  like,  “Does  this  condition 
compute  this  (externally  defined)  function?”  Of 
course,  the  existence  of  these  external  functions  for 
every  line  of  code  can  be  questioned,  but  Howden 
has  a  quick  reply — you  can  use  the  code  to  generate 


the  function!  While  such  sleight-of-hand  may  be 
disturbing  at  fust,  it  is  clear  that  in  some  cases  this 
procedure  is  appropriate,  as  when  a  section  of  code 
fits  a  standard  paradigm  and  is  headed  by  a  com¬ 
ment  such  as  “sort  list.”  To  understand  Howden’s 
development  fully,  it  is  necessary  to  see  his 
progress  through  several  papers,  especially  [How- 
den76],  [HowdenSOb],  and  [Howden82]. 

This  paper  is  essential  reading  for  the  instructor. 
The  presentation  is  at  such  a  high  level  that  it  will 
be  difficult  for  an  uninitiated  stuttent  to  understand, 
even  though  it  is  very  well  written. 

Howden87 

Howden,  William  E.  FunctioruU  Program  Testing 
and  Analysis.  New  York:  McGraw-Hill,  1987. 

This  book  contains  an  excellent  chapter  on  theoret¬ 
ical  foundations  of  program  testing,  including  mate¬ 
rial  found  nowhere  else.  The  model  of  functional 
testing  and  analysis  presented  in  the  book  requires 
detailed  internal  specifications  of  behavior,  how¬ 
ever,  which  are  seldom  available.  Extensions  to  the 
model  are  seen  in  [Howden89]. 

Howden89 

Howden,  William  E.  “Validating  Programs  Without 
Specifications.”  ACM  Software  Eng.  Notes  14,  8 
(Dec.  1989),  2-9. 

This  article  does  not  contain  an  abstract,  but  its  con¬ 
tents  are  summarized  in  the  following  excerpt: 

In  the  error  based  approach  to  program  testing 
and  analysis  ,  the  focus  is  on  errors  that  a  pro¬ 
grammer  or  designer  may  make  during  the 
software  development  process,  and  on  tech¬ 
niques  that  can  be  used  to  detect  their  occur¬ 
rence.  ...  It  is  often  the  case  that  a  program  is 
constructed  without  any  formal,  detailed  spec¬ 
ification.  In  this  case  the  code  itself  is  the 
only  complete  specification.  This  means  that 
the  only  way  to  verify  such  a  program  is  to 
ensure  that  no  errors  were  made  by  the  pro¬ 
grammer  during  programming.  The  term 
“errors”  here  means  errws  that  occur  due  to 
human  fallibility.  This  requires  that  we  study 
the  ways  in  which  humans  make  mistakes  in 
the  construction  of  artifacts,  and  then  build 
methods  to  detect  when  they  have  occurred. ... 

We  have  used  a  simple  model  in  which 
human  errors  are  classified  as  being  either  er¬ 
rors  of  decomposition  or  errors  of  abstraction. 

...  Flavor  analysis  is  a  kind  of  dynamic  type 
checking.  It  allows  the  programmer  to  docu¬ 
ment  properties  of  objects  that  change  during 
the  operation  of  a  program,  and  to  check  if 
assumptions  about  an  object’s  current  set  of 
properties  are  correct 
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This  article  fnx)vides  excellent  motivation  for  the 
use  of  flavor  analysis  in  large  systems  for  the  detec¬ 
tion  of  decomposition  errors.  It  complements  the 
work  found  in  [HowdenQO], 

HowdenSO 

Howden,  William  E.  “Comments  Analysis  and  Pro¬ 
gramming  Errors.”  IEEE  Trans.  Software  Eng.  16,  1 
(Jan.  1990),  72-81. 

Abstract:  Software  validation  is  treated  as  the 
problem  of  detecting  errors  that  programmers  make 
during  the  software  development  process.  This  in¬ 
cludes  fault  detection,  in  which  the  focus  is  on  tech¬ 
niques  for  detecting  the  occurrence  of  local  errors 
which  result  in  well  defined  classes  of  program 
statement  faults.  It  also  includes  detecting  other 
kinds  of  errors,  such  as  decomposition  errors. 
These  occur  when  there  is  an  inconsistency  between 
two  parts  of  a  program  and  are  the  result  of  a  false 
assumption  made  in  one  part  of  the  program  about 
the  properties  of  some  other  part.  The  main  focus 
of  the  paper  is  on  a  decomposition  error  analysis 
technique  called  comments  analysis.  In  this  tech¬ 
nique,  errors  are  detected  by  analyzing  special 
classes  of  program  comments.  Comments  aruilysis 
has  been  applied  to  a  variety  of  different  kinds  of 
systems,  including  both  a  data  processing  program 
and  an  avionics  real-time  program.  The  use  of 
comments  analysis  for  sequential  and  concurrent 
systems  is  discussed  and  the  basic  features  of  com¬ 
ments  analysis  tools  are  summarized.  The  relation¬ 
ship  of  comments  analysis  to  other  techniques,  such 
as  event  sequence  analysis,  are  discussed,  and  the 
differences  between  it  and  earlier  work  are  ex¬ 
plained. 

This  paper  is  not  primarily  directed  to  unit  testing. 

It  is  included  here  because  it  illustrates  one  practical 
method  of  employing  error-based  knowledge  in  the 
testing  process,  especially  in  the  context  of  concur¬ 
rent  programming.  Related  work  is  found  in  [How- 
den89]. 

Huang75 

Huang,  J.  C.  “An  Approach  to  Program  Testing.” 

ACM  Computing  Surveys  8,  3  (Sept.  1975),  113-128. 

Reprinted  in  [MillerSI]. 

Abstract:  One  of  the  practical  methods  commonly 
used  to  detect  the  presence  of  errors  in  a  computer 
program  is  to  test  it  for  a  set  of  test  cases.  The 
probability  of  discovering  errors  through  testing 
can  be  increased  by  selecting  test  cases  in  such  a 
way  that  each  and  every  branch  in  the  flowchart 
will  be  traversed  at  least  once  during  the  test.  This 
tutorial  describes  the  problems  involved  and  the 
methods  that  can  be  used  to  satiffy  the  test  require¬ 
ment. 

This  paper  discusses  a  method  for  determining  path 


conditions  to  enable  achievement  of  branch  cover¬ 
age. 

The  paper  is  very  easy  to  understand  and  should 
cause  no  problems  for  students.  It  will  introduce 
them  to  predicate  calculus  notation  for  expressing 
path  conditions.  It  is  recommended  reading  for 
both  instructor  and  students. 

iEEE83 

IEEE.  IEEE  Standard  for  Software  Test  Docu¬ 
mentation,  ANSI/IEEE  Std  829-1983.  New  York: 
IEEE,  1983. 

Test  documentation  analogous  to  the  documentation 
of  the  uaditional  waterfall  life-cycle  development 
model  is  defined  and  illustrated  in  this  standard. 

IEEE87 

IEEE.  IEEE  Standard  for  Software  Unit  Testing, 
ANSI/IEEE  Std  1008-1987.  New  York:  IEEE, 
1987. 

The  processes  and  products  of  unit  testing  are  de¬ 
fined  and  illustrated  in  this  standard. 

IEEE90 

IEEE.  IEEE  Standard  Glossary  of  Software  Engi¬ 
neering  Terminology,  ANSI/IEEE  Std  610.12-1990. 
New  York:  IEEE,  1990. 

This  is  a  revision  and  re-designation  of  an  earlier 
glossary,  ANSI/IEEE  Std  729-1983).  The  “Cor¬ 
rected  Edition”  is  dated  Feb.  1991,  but  it  retains 
“1990”  in  its  number. 

Students  should  not  only  learn  and  employ  accept¬ 
able  terminology,  but  they  should  also  learn  why 
standardized  terminology  is  important  Comparing 
deflnitions  here  with  those  that  appear  in  books  and 
papers  will  help  them  learn  both  lessons. 

Jachner84 

Jachner,  Jacek,  and  Vinod  K.  Agarwal.  “Data  Flow 
Anomaly  Detection.”  IEEE  Trans.  Software  Eng. 
SE-10, 4  (July  1984),  432-437. 

Abstract:  The  occurrence  of  a  data  flow  anomaly  is 
often  an  indication  of  the  existence  of  a  program¬ 
ming  error.  The  detection  of  such  anomalies  can  be 
used  for  detecting  errors  and  to  upgrade  software 
quality.  This  paper  introduces  a  new,  efficient  algo¬ 
rithm  capable  of  detecting  anomalous  data  flow 
patterns  in  a  program  represented  by  a  graph.  The 
algorithm  based  on  static  analysis  scans  the  paths 
entering  and  leaving  each  node  of  the  graph  to 
reveal  anomalous  data  action  combiruitions.  An  al¬ 
gorithm  implementing  this  type  of  approach  was 
proposed  by  Fosdick  and  Osterweil  [21.  Our  ap¬ 
proach  presents  a  general  framework  which  not 


36 


SEI-CM-9-2.0 


Unit  Analysis  and  Testing 


only  fills  a  gap  in  the  previous  algorithm,  but  also 
provides  time  and  space  improvements. 

This  paper  corrects  a  problem  in  [Fosdick76]  and 
cannot  understood  without  having  read  that  arti¬ 
cle. 

Instructors  who  use  [Fosdick76]  must  also  read  this 
paper.  The  paper  opens  up  the  possibility  of  a 
meta-discussion  about  the  need  to  analyze  papers 
critically.  The  shock  effect  on  students  of  the 
reliability  of  published  papers  is  not  to  be  under¬ 
estimated.  Other  illustrations  of  the  need  for  critical 
analysis  can  be  found  in  [Gerhart76],  [WeyukerSO], 
and  [Zweben89]. 

Jalote89 

Jalote,  Pankaj.  “Testing  the  Completeness  of  Speci¬ 
fications.”  IEEE  Trans.  Software  Eng.  15,  5  (May 
1989),  526-31. 

Abstract:  Specifications  are  means  to  define  for¬ 
mally  the  behavior  of  a  system  or  a  system  compo¬ 
nent.  Completeness  is  a  desirable  property  for 
specifications.  In  this  paper,  we  describe  a  system 
that  tests  for  the  completeness  of  axiomatic  specifi¬ 
cations  of  abstract  data  types.  For  testing,  the  sys¬ 
tem  generates  a  set  of  test  cases  and  an  implemen¬ 
tation  of  the  data  type  from  the  specifications.  The 
generated  implementation  is  such  that  if  the  specifi¬ 
cations  are  not  complete,  the  implementation  is  not 
complete,  and  the  behavior  of  all  of  the  sequences 
of  valid  operations  on  the  data  type  is  not  defined. 
This  implementation  is  tested  with  the  generated 
test  cases  to  detect  the  incompleteness  of  specifi¬ 
cations.  The  system  is  implemented  on  a  VAX  sys¬ 
tem  running  Unix. 

The  paper  illustrates  fault-based  testing  of  a  specifi¬ 
cation.  The  fault  under  consideration  is  one  of 
missing  axioms.  A  brief  overview  of  algebraic 
specifications  is  given.  The  paper  defines  an  ADT 
to  be  sufficiently  complete  if  and  only  if,  for  every 
possible  instance  of  the  abstract  type,  the  result  of 
all  behavior  operations  is  defined  by  the  specifi¬ 
cations.  The  paper  considers  only  incompleteness 
caused  by  missing  axioms.  It  presents  heuristics 
based  on  test  data  generated  from  the  syntactic  por¬ 
tion  of  the  specification  for  discovering  that  omis¬ 
sion.  The  paper  cites  further  references  to  the  im¬ 
plementation  of  the  system  described. 

The  paper  raises  some  interesting  problems,  such  as 
whether  heuristics  exist  for  discovering  other 
classes  of  faults  in  algebraic  specifications,  and 
whether  testing  categorically  proves  the  absence  of 
those  faults. 

This  paper  is  appropriate  for  students  only  after  they 
have  bwn  exposed  to  the  principles  of  algebraic 
specifications.  It  can  be  used  to  illustrate  how  fault- 
ba.sed  testing  techniques  can  be  applied  to  testing 
specifications. 
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Korel87 

Korel,  Bogdan.  “The  Program  Dependence  Graph  in 
Static  Program  Testing.”  Information  Processing 
Letters  24,  2  (Jan.  1987),  103-108. 

Abstract:  In  this  paper,  new  techniques  for  static 
program  testing  are  presented.  The  techniques  are 
based  on  the  program  dependence  graph,  which 
models  the  structure  of  the  program  in  terms  of  data 
and  control  dependences  between  program  instruc¬ 
tions.  First,  a  new  approach  for  redundant  code 
detection  is  proposed.  The  main  idea  is  based  on 
the  observation  that  each  program  instruction 
should  have  influence  on  the  output  of  the  program 
otherwise  it  is  considered  redundant.  Second,  an 
input  output  relationship  analysis,  which  reflects 
the  influence  of  specific  input  data  on  specific  out¬ 
put  data  of  the  program,  is  proposed.  It  is  shown 
that  the  presented  techniques  can  increase  the  num¬ 
ber  of  detectable  errors  as  compared  with  error  de 
tection  through  data  flow  analysis  alone. 

This  paper  is  crucial  to  understanding  the  work  of 
Korel  and  others  who  seek  to  use  data  flow  infor¬ 
mation  in  novel  ways.  The  paper  is  self-contained 
and  is  essential  rea^ng  for  anyone  studying  meth¬ 
ods  of  representing  data  flow  information. 

Korel88a 

Korel,  Bogdan,  and  Janusz  Laski.  “Dynamic  Pn> 
gram  Slicing.”  Information  Processing  Letters  29.  3 
(Oct.  1988),  155-163. 

Abstract:  A  dynamic  program  slice  is  an  ex¬ 
ecutable  subset  of  the  original  program  that  pro¬ 
duces  the  same  computations  on  a  subset  of  selected 
variables  and  inputs.  It  differs  from  the  static  slice 
(Weiser,  1982,  1984)  in  that  it  is  entirely  defined  on 
the  basis  of  a  computation.  The  two  main  advan¬ 
tages  are  the  following:  Arrays  and  dynamic  data 
structures  can  be  handled  more  precisely  and  the 
slice  can  be  significantly  reduced,  leading  to  a  finer 
localization  of  the  fault.  The  approach  is  being 
investigated  as  a  possible  extension  of  the  debug¬ 
ging  capabilities  of  STAD,  a  recently  developed 
System  for  Testing  and  Debugging  (Korel  and 
Laski.  1987;  Laski.  1987). 

This  paper  should  be  read  belwe  [Korel90b]. 

Korel88b 

Korel,  Bogdan,  and  Janusz  Laski.  “STAD — A  Sys¬ 
tem  For  Testing  and  Debugging:  User  Perspective.” 
Proc.  Second  Workshop  on  Software  Testing.  Verifi¬ 
cation,  and  Analysis.  Washington,  D.C.:  IEEE 
Computer  Society  Press,  1988,  13-20. 

Abstract:  A  recently  developed,  experimental,  inte¬ 
grated  System  for  Testing  and  Debugging  is 
presented.  Its  testing  part  supports  three  data  flow 
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coverage  criteria.  The  debugging  part  guides  the 
programmer  in  the  localization  of  faults  by  gener¬ 
ating  and  interactively  verifying  hypotheses  about 
their  location. 

The  three  data  flow  coverage  criteria  referred  to  are 
U-  and  L-context  testing  and  chain  testing.  The  tool 
reports  coverage  in  terms  of  these  three  criteria. 

This  paper  extends  the  concepts  found  in  [Laski83]. 

It  would  serve  well  as  an  example  of  program  in¬ 
strumentation. 

Korel90a 

Korel,  Bogdan.  “Automated  Software  Test  Data 

Generation.”  IEEE  Trans.  Software  Eng.  16,  8  (Aug. 

1990),  870-879. 

Abstract:  Test  data  generation  in  program  testing 
is  the  process  of  identifying  a  set  of  test  data  which 
satisfies  given  testing  criterion.  Most  of  the  existing 
test  data  generators  ...  use  symbolic  evaluation  to 
derive  test  data.  However,  in  practical  programs 
this  technique  frequently  requires  complex  al¬ 
gebraic  manipulations,  especially  in  the  presence  of 
arrays.  In  this  paper  we  present  an  alternative  ap¬ 
proach  of  test  data  generation  which  is  based  on 
actual  execution  of  the  program  under  test,  function 
minimization  methods,  and  dynamic  data  flow  anal¬ 
ysis.  Test  data  are  developed  for  the  program  using 
actual  values  of  input  variables.  When  the  program 
is  executed,  the  program  execution  flow  is 
monitored.  If  during  program  execution  an  unde¬ 
sirable  execution  flow  is  observed  (e.g.,  the 
“actual"  path  does  not  correspond  to  the  selected 
control  path)  then  furKtion  minimization  search  al¬ 
gorithms  are  used  to  automatically  locate  the  values 
of  input  variables  for  which  the  selected  path  is 
traversed.  In  addition,  dynamic  data  flow  analysis 
is  used  to  determine  those  input  variables  respon¬ 
sible  for  the  undesirable  program  behavior,  leading 
to  significant  speedup  of  the  search  process.  The 
approach  to  generating  test  data  is  then  extended  to 
programs  with  dynamic  data  structures,  and  a 
search  method  based  on  dynamic  data  flow  analysis 
and  backtracking  is  presented.  In  the  approach  de¬ 
scribed  in  this  paper,  values  of  array  indexes  and 
pointers  are  known  at  each  step  of  program  execu¬ 
tion,  and  this  approach  exploits  this  irtformation  to 
overcome  difficulties  of  array  and  pointer  handling; 
as  a  result,  the  effectiveness  of  test  data  generation 
can  be  significantly  improved. 

This  is  an  excellent  paper,  which  presents  an  inno¬ 
vative  and  complex  approach  to  the  problem  of  test 
data  generation.  The  paper  addresses  some  of  the 
most  complex  issues  involving  arrays  and  dynamic 
data  structures.  Appropriate  references  to  the  im¬ 
plemented  systems  are  cited. 

A  sophisticated  background  in  data  structures  and 
data  flow  theory  is  required  to  read  this  paper. 


Korel90b 

Koiel,  Bogdan,  and  Janusz  Laski.  “Dynamic  Slicing 
of  Computer  Programs.”  J.  Syst.  and  Software  13,  3 
(Nov.  1990),  187-195. 

Abstract:  Program  slicing  is  a  usrful  tool  in  pro¬ 
gram  debugging  ... .  Dynamic  slicing  introduced  in 
this  paper  differs  from  the  original  static  slicing  in 
that  it  is  defined  on  the  basis  of  a  computation.  A 
dynamic  program  slice  is  an  executabie  part  of  the 
original  program  that  preserves  part  of  the 
program's  behavior  for  a  specific  input  with  respect 
to  a  subset  of  selected  variables,  rather  than  for  all 
possible  computations.  As  a  result,  the  size  of  a 
slice  can  be  significantly  reduced.  Moreover,  the 
approach  allows  us  to  treat  array  elements  and 
fields  in  dynamic  records  as  individual  variables. 
This  leads  to  a  further  reduci  '^n  of  the  slice  size. 

The  application  of  dynamic  analysis  to  program 
slicing  is  clearly  advantageous  fcv  debugging  and 
for  data  flow  testing. 

The  paper  is  accessible  to  students  only  after  read¬ 
ing  [Korel88a)  and  [Weiser84]. 

Laskl83 

Laski,  Janusz  W.,  and  Bogdan  Korel.  “A  Data  Flow 
Oriented  Program  Testing  Strategy.”  IEEE  Trans. 
Software  Eng.  SE-9,  3  (May  1983),  347-354. 

Abstract:  Some  properties  of  a  program  data  flow 
can  be  used  to  guide  program  testing.  The 
presented  approach  aims  to  exercise  use-definition 
chains  that  appear  in  a  program.  Two  such  data 
oriented  testing  strategies  are  proposed;  the  first 
involves  checking  liveness  of  every  definition  of  a 
variable  at  the  point(s)  of  its  possible  use;  the  sec¬ 
ond  deals  with  liveness  of  vectors  of  variables 
treated  as  arguments  to  an  instruction  or  program 
block.  Reliability  of  these  strategies  is  ^cussed 
with  respect  to  a  program  containing  an  error. 

This  paper  provides  a  transition  from  the  use  of  data 
flow  to  detect  anomalies  in  programs  to  its  use  as  a 
method  for  selecting  and  evaluating  test  data. 

The  paper  should  be  read  by  the  instructor  and  is 
accessible  to  students.  The  instructor  should  em¬ 
phasize  the  difference  between  using  a  criterion  for 
evaluation  and  using  it  for  generation. 

MillerSI 

Miller,  Edward,  and  William  E.  Howden,  eds. 
Tutorial:  Software  Testing  &  Validation  Techniques, 
2nd  Ed.  New  York:  IEEE  Computer  Society  Press, 
1981. 

This  collection  of  articles  is  dated  and  out-of-print, 
but  it  contains  copies  of  many  of  the  older  articles 
discussed  in  this  module.  A  new  edition  is  in  prepa¬ 
ration. 


38 


SEI-CM-9-2.0 


Unit  Analysis  and  Testing 


MIIIS75 

Mills,  Harlan  D.  “The  New  Math  of  Computer 

Programming.”  Comm.  ACM  18,  1  (Jan.  1975), 

43-48. 

Abstract:  Structured  programming  has  proved  to 
be  an  important  methodology  for  systematic  pro¬ 
gram  design  and  development.  Structured  pro¬ 
grams  are  identified  as  compound  function  expres¬ 
sions  in  the  algebra  of  functions.  The  algebraic 
properties  of  these  function  expressions  permit  the 
reformulation  (expansion  as  well  as  reduction)  of  a 
nested  subexpression  independently  of  its  environ¬ 
ment,  thus  modeling  what  is  known  as  stepwise  pro¬ 
gram  refinement  as  well  as  program  execution. 
Finally,  structured  programming  is  characterized  in 
terms  of  the  selection  and  solution  of  certain 
elementary  equations  defined  in  the  algebra  of  func¬ 
tions.  These  solutions  can  be  given  in  general  for¬ 
mulas,  each  involving  a  single  parameter,  which 
display  the  entire  freedom  available  in  creating  cor¬ 
rect  structured  programs. 

The  functional  view  of  programs  is  introduced  in 
this  classic  paper.  The  paper  is  essential  reading  for 
the  instructor  and  student  alike  if  the  functional 
view  is  to  be  given  serious  consideration. 

Mills83 

Mills,  Harlan  D.  Software  Productivity.  Boston: 
Little,  Brown,  1983. 

This  collection  of  articles  on  the  subject  of  software 
processes  was  written  by  Harlan  Mills  over  a  period 
of  years.  Mills’s  seminal  article  on  error  seeding  is 
reprinted  here. 

Morell87 

Morell,  Larry  J.  “A  Model  for  Assessing  Code- 

Based  Testing  Techniques.”  Proc.  Fifth  Ann.  Pacific 

Northwest  Software  Quality  Conf.  Portland,  Ore.: 

Lawrence  &  Craig,  1987,  309-326. 

Abstract:  A  theory  of  fault-based  program  testing 
is  defined  and  explained.  Testing  is  fault-based 
when  it  seeks  to  demonstrate  that  prescribed  faults 
are  not  in  a  program.  It  is  assumed  here  that  a 
program  can  only  be  incorrect  in  a  limited  fashion 
specified  by  associating  alternate  expressions  with 
program  expressions.  Classes  of  alternate  expres¬ 
sions  can  be  infinite.  Substitution  of  an  alternate 
expression  for  a  program  expression  yields  cm  al¬ 
ternate  program  that  is  potentially  correct.  The 
goal  of  fault-based  testing  is  to  produce  a  test  set 
that  differentiates  the  program  from  each  of  its  al¬ 
ternates. 

A  particular  form  of  fault-based  testing  based  on 
symbolic  execution  is  presented.  In  symbolic 
testing  program  expressions  are  replaced  by  sym¬ 
bolic  alternatives  that  represent  classes  of  alternate 


expressions.  The  output  from  the  system  is  an  ex¬ 
pression  in  terms  of  the  input  and  the  symbolic  al¬ 
ternative.  Equating  this  with  the  output  from  the 
original  program  yields  a  propagation  equation 
whose  solutions  determine  those  alternatives  which 
are  not  differentiated  by  this  test. 

This  paper  contains  a  gentle  introduction  to  (sym¬ 
bolic)  fault-based  testing.  The  coupling  effect  is 
discussed  and  formally  characterized.  It  is  then 
shown  that  for  particular  classes  of  faults  and  pro¬ 
gram  constructs,  the  probability  is  small  that  a 
double  fault  remains  undetected  if  each  of  the  single 
faults  is  eliminated. 

This  paper  is  useful  to  those  interested  in  theoretical 
analysis  of  the  coupling  effect. 

Morell88 

Morell,  Larry  J.  “Theoretical  Insights  into  Fault- 

Based  Testing.”  Proc.  Second  Workshop  on  Soft¬ 
ware  Testing,  Verification,  and  Analysis.  Washing¬ 
ton,  D.C.:  IEEE  Computer  Society  Press,  1988, 

45-62. 

Abstract:  Testing  is  fault-based  when  its  goal  is  to 
demonstrate  the  absence  of  prespecified  faults.  This 
paper  presents  a  framework  that  characterizes 
fault-based  testing  schemes  based  on  how  many 
prespecified  faults  are  considered  and  on  the  con¬ 
textual  information  used  to  deduce  the  absence  of 
those  faults.  Established  methods  of  fault-based 
testing  are  placed  within  this  framework.  Most 
methods  either  are  limited  to  finite  fault  classes,  or 
focus  on  local  effects  of  faults  rather  than  global 
effects.  A  new  method  of  fault-based  testing  called 
symbolic  testing  is  presented  by  which  irfinitely 
many  prespecified  faults  can  be  proven  to  be  absent 
from  a  program  based  upon  the  global  effect  the 
faults  would  have  if  they  were  present.  Cir¬ 
cumstances  are  discussed  as  to  when  testing  with  a 
finite  test  set  is  sufficient  to  prove  that  infinitely 
many  prespecified  faults  are  not  present  in  a  pro¬ 
gram. 

Fault-based  testing  seeks  to  demonstrate  that  a 
given  program  is  unique  among  a  neighborhood  of 
programs  defined  by  classes  of  faults.  Fault-based 
testing  schemes  may  be  classified  according  to 
breadth  of  the  neighimrhood  (fmite  or  infinite)  and 
the  extent  of  the  propagation  (local  or  global)  used 
to  distinguish  the  programs  from  other  members  of 
the  neighborhood. 

This  paper  contains  proofs  of  the  theorems  cited  in 
[MorellQO]  and  a  useful  history  of  fault-based  test¬ 
ing.  After  some  background  reading  on  symbolic 
execution,  most  of  the  paper  should  be  understand¬ 
able  to  the  student.  The  theorems  require  familiar¬ 
ity  with  the  halting  problem  of  computability  the¬ 
ory. 
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MorellSO 

Morell,  Larry  J.  “A  Theory  of  Fault-Based  Testing.” 

IEEE  Trans.  Software  Eng.  16,  9  (Aug.  1990), 

844-857. 

Abstract:  A  theory  of  fault-based  testing  is  defined 
and  explained.  Testing  is  fault-based  when  it  seeks 
to  demonstrate  that  prescribed  faults  are  not  in  a 
program.  Is  is  assumed  here  that  a  program  can 
only  be  incorrect  in  a  limited  fashion  speckled  by 
associating  alternate  expessions  with  program  ex¬ 
pressions.  Classes  of  alternate  expression  can  be 
infinite.  Substitution  of  an  alternate  expression  for 
a  program  expression  yields  an  alternate  program 
that  is  potentially  correct.  The  goal  of  fault-based 
testing  is  to  produce  a  test  set  that  differentiates  the 
program  from  each  of  its  alternates. 

A  particular  form  of  fault-based  testing  based  on 
symbolic  execution  is  presented.  In  symbolic  test¬ 
ing  program  expressions  are  replaced  by  symbolic 
alternatives  that  represent  classes  of  alternate  ex¬ 
pressions.  The  output  from  the  system  is  an  expres¬ 
sion  in  terms  of  the  input  and  the  symbolic  alter¬ 
native.  Equating  this  with  the  output  from  the  orig¬ 
inal  program  yields  a  propagation  equation  whose 
solutions  determine  those  alternatives  which  are  not 
differentiated  by  this  test.  Since  an  alternative  set 
can  be  infinite,  it  is  possible  that  no  finite  test  dif¬ 
ferentiates  the  program  from  all  its  alternates.  Cir¬ 
cumstances  are  described  as  to  when  this  is  decid¬ 
able. 

This  paper  extends  the  work  of  (MorelISS]  by  includ¬ 
ing  analysis  of  both  symbolic  faults  and  symbolic 
errors.  Prerequisite  reading  in  the  area  of  symbolic 
execution  may  be  necessary. 

MuchnIckSI 

Muchnick,  Steven  S.,  and  Neil  D.  Jones,  eds. 

Program  Flow  Analysis:  Theory  and  Applications. 

Englewood  aiffs,  N.J.:  Prentice-Hall,  1981. 

This  book  delves  deeply  into  the  subject  of  data 
flow  analysis  and  many  areas  of  its  application  to 
testing,  including  static  analysis  tools  and  symbolic 
execution. 

This  bode  is  for  experts, 

Myers79 

Myers,  Glenford  J.  The  Art  of  Software  Testing. 

New  York:  John  Wiley,  1979. 

This  book  is  an  often-cited  reference  on  software 
testing.  Although  it  is  somewhat  dated,  students 
find  it  helpful  and  easy  to  read. 


Ntafos84 

Ntafos,  Simeon  C.  “On  Required  Element  Testing.” 

IEEE  Trans.  Software  Eng.  SE-10,  6  (Nov.  1984), 

795-803. 

Abstract:  In  this  paper  we  introduce  two  classes  of 
program  testing  strategies  that  consist  of  specifying 
a  set  of  required  elements  for  the  program  and  then 
covering  those  elements  with  appropriate  test  in¬ 
puts.  In  general,  a  required  element  has  a  struc¬ 
tural  and  a  functional  component  and  is  covered  by 
a  test  case  if  the  test  case  causes  the  features  speci¬ 
fied  in  the  structural  component  to  be  executed  un¬ 
der  the  conditions  specified  in  the  functional  com¬ 
ponent.  Data  flow  analysis  is  used  to  specify  the 
structural  component  and  data  flow  interactions  are 
used  as  a  basis  for  developing  the  functional  com¬ 
ponent.  The  strategies  are  illustrated  with  examples 
and  some  experimental  evaluations  of  their  effec¬ 
tiveness  are  presented. 

The  authcH-  establishes  a  general  framework  for  inte¬ 
grating  structural  testing  with  data  flow  informa¬ 
tion. 

The  paper  could  be  useful  to  the  instructor,  but  it  is 
less  accessible  to  the  student.  It  may  be  helpful  to 
first  read  [Rapps85],  which  is  more  comprehensive 
in  its  treatment  of  approaches. 

Ntafos88 

Ntafos,  Simeon  C.  “A  Comparison  of  Some  Struc¬ 
tural  Testing  Strategies.”  IEEE  Trans.  Software  Eng. 

14,  6  (June  1988),  868-874. 

Abstract:  In  this  paper  we  compare  a  number  of 
structural  testing  strategies  in  terms  of  their  relative 
coverage  of  the  program’s  structure  and  also  in 
terms  of  the  number  of  test  cases  needed  to  satisfy 
each  strategy.  We  also  discuss  some  of  the 
deficiencies  of  such  comparisons. 

This  paper  contains  an  extended  overview  of  data 
flow  testing  methods,  surveying  the  main  papers  in 
this  area.  It  also  corrects  a  mistake  in  an  earlier 
version  of  Ntafos’s  ik-dr  testing  strategy.  The  paper 
extends  the  subsumption  hierarchy  introduced  in 
{Rapps85]  by  including  TER^  =  1  (see  [Wood- 
wardSO]),  boundary-interior  testing  methods,  and 
ik-dr  testing  (see  [Ntafos84]). 

Because  it  provides  a  historical  perspective  on  data 
flow  testing,  this  paper  could  be  used  as  the  first 
reading  in  the  area  of  data  flow  testing,  followed  by 
some  of  the  earlier  papers. 

Offutt89 

Offutt,  A.  Jefferson.  “The  Coupling  Effect:  Fact  or 

Fiction?”  ACM  Software  Eng.  Notes  14,  8  (Dec. 

1989),  131-140. 


40 


SEI-CM-9-2.0 


Unit  Analysis  and  Testing 


Abstract:  Fault-based  testing  strategies  test  soft¬ 
ware  by  focusing  on  specific,  common  types  of  er¬ 
rors.  The  coupling  effect  states  that  test  data  sets 
that  detect  simple  types  of  faults  are  sensitive 
enough  to  detect  more  complex  types  of  faults.  This 
paper  describes  empirical  investigations  into  the 
coupling  effect  over  a  specific  domain  of  software 
faults.  All  the  results  from  the  investigation  support 
the  validity  of  the  coupling  effect.  The  major  con¬ 
clusion  from  this  investigation  is  that  by  explicitly 
testing  for  simple  faults,  we  are  also  implicitly  test¬ 
ing  for  more  complicated  faults.  This  gives  con¬ 
fidence  that  fault-based  testing  is  an  effective  means 
of  testing  software. 

The  difficulty  with  the  notion  of  coupling  effect  is 
the  imprecision  of  the  terms  “simple”  and  “com¬ 
plex.”  Offutt  uses  the  interpretation  of  these  terms 
given  by  Morell:  simple  faults  are  denoted  by  single 
mutants:  double  faults  are  denoted  by  double 
mutants.  In  no  case  examined  in  this  suidy  did  any 
non-equivalent  double-order  mutant  survive  two 
sets  of  mutation-adequate  test  data.  Theoretical 
treatment  of  the  coupling  effect  may  be  found  in 
[MorellST]. 

This  paper  is  easy  to  read,  but  it  requires  a  thorough 
background  in  mutation  testing.  Fruitful  class  dis¬ 
cussion  can  be  generated  concerning  the  experimen¬ 
tal  design  and  the  validity  of  the  conclusions  drawn. 

Osterwell76 

Osterweil,  Leon  J.,  and  Lloyd  D.  Fosdick.  “DAVE — 

A  Validation  Error  Detection  and  Documentation 

System  for  Fortran  Programs.”  Software — Practice 
and  Experience  6,  4  (OcL-Dec.  1976),  473-486. 

Reprinted  in  [Miller81]. 

Abstract:  This  paper  describes  DAVE,  a  system  for 
analyzing  Fortran  programs.  DAVE  is  capable  of 
detecting  the  symptoms  of  a  wide  variety  of  errors 
in  programs,  as  well  as  assuring  the  absence  of 
these  errors.  In  addition,  DAVE  exposes  and  docu¬ 
ments  subtle  data  relations  and  flows  within  pro¬ 
grams.  The  central  analytic  procedure  used  is  a 
depth  first  search.  DAVE  itself  is  written  in 
Fortran.  Its  implementation  at  the  University  of 
Colorado  and  some  early  experience  is  described. 

After  an  abrupt  introduction  to  data  flow  anomalies, 
the  paper  gives  two  algorithms  for  computing  the 
input/output  classification  of  a  variable.  The  rela¬ 
tionship  between  these  algorithms  and  the  detection 
of  data  flow  anomalies  is  not  immediately  obvious. 
[Fosdick76]  should  be  read  first  and  compared  with 
this  article.  The  algorithms  here  are  expressed  in  an 
Algol-like  language,  making  them  more  palatable 
than  those  in  (Fosdick76]. 

The  paper  could  serve  as  detailed  reading  for  the 
instructor.  The  density  of  notation  makes  it  difficult 
for  the  student. 


Ould86 

Quid,  Martyn  A.,  and  Charles  Unwin,  eds.  Testing 
in  Software  Development.  Cambridge,  England: 
Cambridge  University  Press,  1986. 

This  excellent  monograph  focuses  on  four  views  of 
testing:  the  manager’s,  the  user’s,  the  designer’s, 
and  the  programmer’s.  All  levels  of  testing 
(acceptance,  system,  integration,  and  unit)  are  dis¬ 
cussed. 

This  book  is  a  good  supplement  for  a  project- 
oriented  software  engineering  course.  When  sup¬ 
plemented  with  readings  from  the  literature,  it  pro¬ 
vides  a  sufficient  framework  for  a  course  in  soft¬ 
ware  testing. 

Perlman90 

Perlman,  Gary.  User  Interface  Development.  Cur¬ 
riculum  Module  SEI-CM-17-1.1,  Software  Engi¬ 
neering  Institute,  Carnegie  Mellon  University,  Pitts¬ 
burgh,  Pa.,  Jan.  1990. 

Capsule  Description:  This  module  covers  the  is¬ 
sues,  information  sources,  and  methods  used  in  the 
design,  implementation,  and  evaluation  of  user 
interfaces,  the  parts  of  software  systems  designed  to 
interact  with  people.  User  interface  design  draws 
on  the  experiences  of  designers,  current  trends  in 
input! output  technology,  cognitive  psychology, 
human  factors  (ergonomics)  research,  guidelines 
and  standards,  and  on  the  feedback  from  evaluating 
working  systems.  User  interface  implementation 
applies  modern  software  development  techniques  to 
building  user  interfaces.  User  interface  evaluation 
can  be  based  on  empirical  evaluation  of  working 
systems  or  on  the  predictive  evaluation  of  system 
design  specifications. 

PodgurskiSO 

Podgurski,  Andy,  and  Lori  A.  Qarke.  “A  Formal 
Model  of  Program  Dependencies  and  Its  Implica¬ 
tions  for  Software  Testing,  Debugging,  and  Mainte¬ 
nance.”  IEEE  Trans.  Software  Eng.  16,  9  (Sept. 
1990),  965-979. 

Abstract:  A  formal,  general  model  of  program 
dependences  is  presented  and  used  to  evaluate  sev¬ 
eral  dependence-based  software  testing,  debugging, 
and  maintenance  techniques.  Two  generalizations 
of  control  and  data  flow  dependence,  called  weak 
and  strong  syntactic  dependence,  are  introduced 
and  related  to  a  concept  called  semantic  depend¬ 
ence.  Semantic  dependence  models  the  ability  of  a 
program  statement  to  affect  the  execution  behavior 
of  other  statements.  It  is  shown,  among  other 
things,  that  weak  .syntactic  dependence  is  a  neces¬ 
sary  but  not  sufficient  condition  for  semantic  de¬ 
pendence  and  that  strong  syntactic  dependence  is  a 
necessary  but  not  sufficient  condition  for  a  re- 
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stricted  form  of  semantic  dependence  that  is  finitely 
demonstrated.  These  results  are  then  used  to  sup¬ 
port  some  proposed  uses  of  program  dependences, 
to  controvert  others,  and  to  suggest  new  uses. 

This  pc^jer  is  highly  recommended  for  its  clear  and 
concise  definitions  in  the  area  of  data  flow  phenom¬ 
ena  (dependences).  The  substantial  effort  necessary 
to  understand  the  definitions  will  prove  a  useful  in¬ 
vestment  when  reading  other  data  flow  papers. 

This  paper  could  be  used  to  lay  the  mathematical 
foundation  necessary  for  understanding  data  flow  in 
programs. 

Powell82 

Powell,  Patricia  B.,  ed.  Software  Validation,  Verifi¬ 
cation,  and  Testing  Technique  and  Tool  Reference 
Guide,  NBS  Special  Publication  500-93.  Washing¬ 
ton,  D.C.:  National  Bureau  of  Standards,  1982. 

This  book  covers  most  of  the  testing  and  analysis 
techniques  covered  in  this  module.  The  techniques 
are  compared  as  to  their  effectiveness,  applicability, 
ease  of  learning,  and  costs.  The  assessments  are 
accurate  and  succinct. 

This  is  recommended  reading  for  the  instructor;  it 
contains  many  examples  useful  in  the  classroom. 

Probef182 

Probert,  Robert  L.  “Optimal  Insertion  of  Software 
Probes  in  Well-Delimited  Programs.”  IEEE  Trans. 
Software  Eng.  SE-8, 1  (Jaa  1982),  34-42. 

Abstract:  A  standard  technique  for  monitoring  soft¬ 
ware  testing  activities  is  to  instrument  the  module 
under  test  with  counters  or  probes  before  testing 
begins;  then,  during  testing,  data  generated  by 
these  probes  can  be  used  to  ident^  portions  of  as 
yet  unexercised  code.  In  this  paper  the  effect  of  the 
disciplined  use  of  language  features  for  explicitly 
delimiting  control  flow  constructs  is  investigated 
with  respect  to  the  corresponding  ease  of  software 
instrumentation.  In  particular,  assuming  all  control 
constructs  are  explicitly  delimited,  for  example,  by 
END  IF  or  equivalent  statements,  an  easily  pro¬ 
grammed  method  is  given  for  inserting  a  minimum 
number  of  probes  for  monitoring  statement  and 
branch  execution  counts  without  disrupting  source 
code  structure  or  paragraphing.  The  use  of  these 
probes,  called  statement  probes,  is  contrasted  with 
the  use  of  standard  (branch)  probes  for  execution 
monitoring.  It  is  observed  that  the  results  apply  to 
well-delimited  modules  written  in  a  wide  variety  of 
programming  languages,  in  particular,  Ada. 

The  author  surveys  program  instrumentation  tech¬ 
niques  and  describes  a  specific  method.  The  paper 
is  self-contained,  and  the  method  described  is  ap¬ 
plicable  to  most  modem  languages. 


The  paper  should  be  read  by  the  instructor  if  in- 
stmmentation  is  discussed.  A  background  in  graph 
theory  and  formal  grammars  is  necessary.  The 
paper  is  explicit  enough  to  form  the  basis  of  a  class 
project 

Rapps85 

Rapps,  Sandra,  and  Elaine  J.  Weyuker.  “Selecting 
Software  Test  Data  Using  Data  Flow  Information.” 
IEEE  Trans.  Software  Eng.  SE-ll,  4  (April  1985), 
361-375. 

Abstract:  This  paper  defines  a  family  of  program 
test  data  selection  criteria  derived  from  data  flow 
analysis  techniques  similar  to  those  used  in  com¬ 
piler  optimization.  It  is  argued  that  currently  used 
path  selection  criteria,  which  examine  only  the  con¬ 
trol  flow  of  a  program,  are  inadequate.  Our  proce¬ 
dure  associates  with  each  point  in  a  program  at 
which  a  variable  is  defined,  those  points  at  which 
the  value  is  used.  Several  test  data  selection  crite¬ 
ria,  differing  in  the  type  and  number  of  these  associ¬ 
ations,  are  defined  and  compared. 

This  paper  explores  the  hierarchical  relationships 
among  several  data  flow  testing  techniques.  The 
emphasis  is  on  specifying  criteria  that  should  be  sat¬ 
isfied  by  test  data,  not  on  generating  the  data. 

The  paper  should  be  read  by  the  instructor  if  data 
flow  is  to  be  treated  in  depth.  The  paper  is  likely  to 
overwhelm  students. 

Reclwlne83 

Redwine,  Samuel  T.,  Jr.  “An  Engineering  Approach 
to  Software  Test  Data  Design.”  IEEE  Trans.  Soft¬ 
ware  Erg  SE-9, 2  (March  1983),  191-200. 

Abstract:  A  systematic  approach  to  test  data  design 
is  presented  based  on  both  practical  translation  of 
theory  and  organization  of  professional  lore.  The 
approach  is  organized  around  five  domains  and 
achieving  coverage  (exercise)  of  them  by  the  test 
data.  The  domains  are  processing  functions,  input, 
output,  interaction  among  functions,  and  the  code 
itself.  Checklists  are  used  to  generate  data  for 
processing  functions.  Separate  checklists  have  been 
constructed  for  eight  common  business  data  pro¬ 
cessing  functions  such  as  editing,  updating,  sorting, 
and  reporting.  Checklists  or  specific  concrete 
directions  also  exist  for  input,  output,  interaction, 
and  code  coverage.  Two  global  heuristics  concern¬ 
ing  all  test  data  are  also  used.  A  limited  discussion 
on  documenting  test  input  data,  expected  results, 
and  actual  results  is  included. 

Use,  applicability,  and  possible  expansions  are 
covered  briefly.  Introduction  of  the  method  has 
similar  difficulties  to  those  experienced  when  intro¬ 
ducing  any  disciplined  technique  into  an  area 
where  discipline  was  previously  lacking.  The  ap- 
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proach  is  felt  to  be  easily  mod^able  and  usable  for 
types  of  systems  other  than  the  traditional  business 
data  processing  ones  for  which  it  was  originally 
developed. 

This  is  one  of  the  best  papers  on  a  systematic  means 
of  testing  data  processing  software.  The  value  of 
this  paper  lies  in  its  pragmatic  approach  to  test  data 
selection;  there  is  litUe  theory  presented  here. 

As  an  example  of  applied  testing  in  business  ap¬ 
plications,  this  paper  is  a  winner.  It  could  serve  as  a 
self-assessment  test  for  students  who  must  develop 
an  integrated  method. 

RichardsonSS 

Richardson,  Debra  J.,  and  Lori  A.  Qarke.  “Partition 

Analysis:  A  Method  Combining  Testing  and 

Verification.”  IEEE  Trans.  Software  Eng.  SE-II,  12 

(Dec.  1985),  1477-1490. 

Abstract:  The  partition  analysis  method  compares 
a  procedure’s  implementation  to  its  specification, 
both  to  verify  consistency  between  the  two  and  to 
derive  test  data.  Unlike  most  verification  methods, 
partition  analysis  is  applicable  to  a  number  of  difi 
ferent  types  of  specification  languages,  including 
both  procedural  and  nonprocedural  languages.  It 
is  thus  applicable  to  high-level  descriptions  as  well 
as  to  low-level  designs.  Partition  analysis  also  im¬ 
proves  upon  existing  testing  criteria.  These  criteria 
usually  consider  only  the  implementation,  but  par¬ 
tition  analysis  selects  test  data  that  exercise  both  a 
procedure's  intended  behavior  (as  described  in  the 
specifications)  and  the  structure  of  its  implemen¬ 
tation.  To  accomplish  these  goals,  partition  anal¬ 
ysis  divides  or  partitions  a  procedure's  domain  into 
subdomains  in  which  all  elements  of  each  sub- 
domain  are  treated  uniformly  by  the  specification 
and  processed  uniformly  by  the  implementation. 
This  partition  divides  the  procedure  domain  into 
more  manageable  units.  Irformation  related  to 
each  subdomain  is  used  to  guide  in  the  selection  of 
test  data  and  to  verfy  consistency  between  the 
specification  arul  the  implementation.  Moreover, 
the  testing  and  verification  processes  are  designed 
to  enhance  each  other.  Initial  experimentation  has 
shown  that  through  the  integration  of  testing  and 
verification,  as  well  as  through  the  use  of  irfor¬ 
mation  derived  from  both  the  implementation  and 
the  specification,  the  partition  analysis  method  is 
effective  for  evaluating  program  reliability.  This 
paper  describes  the  partition  analysis  method  and 
reports  the  results  obtained  from  an  evaluation  of 
its  effectiveness. 

This  paper  contains  an  excellent  presentation  of  a 
hybrid  approach,  in  which  simultaneous  coverage  of 
both  co^  and  specification  is  attempted.  Prereq¬ 
uisite  reading  includes  domain  testing  [WhiteSO]  and 
symbolic  execution  and  formal  verification  [Hant- 
Ier76]. 


This  paper  is  essential  reading  for  both  instructor 
and  student 

RichardsonSS 

Richardson,  Debra  J.,  and  Margaret  C.  Thompson. 
“The  Relay  Model  of  Error  Detection  and  Its 
Application.”  Proc.  Second  Workshop  on  Software 
Testing,  Verification,  and  Analysis.  Washington, 
D.C.;  IEEE  Computer  Society  Press,  1988, 223-230. 

This  paper  discusses  the  uses  of  conditions  that 
must  be  satisfied  in  order  for  an  error  (infection)  to 
be  introduced  into  the  state  of  the  program  and  to 
transfer  (propagate)  to  the  output 

The  paper  illustrates  the  complexity  encountered 
when  considering  how  infections  propagate. 
Propagation  is  broken  into  two  stages;  to  the  initial 
infection  of  a  portion  of  the  program’s  data  state, 
and  through  successive  execution  to  output  The 
paper  proposes  a  system  for  studying  infection  and 
propagation  analysis. 

This  paper  should  be  read  by  the  instructor  after 
reading  [Voas91]. 

RichardsonSS 

Richardson,  Debra  J.,  Stephanie  Leif  Aha,  and  Leon 
J.  Osterweil.  “Integrating  Testing  Techniques 
Through  Process  Programming.”  ACM  Software 
Eng.  Notes  14,  8  (Dec.  1989),  219-228. 

Abstract:  Integration  of  multiple  testing  techniques 
is  required  to  demonstrate  high  quality  of  software. 
Technique  integration  has  four  basic  goals: 
reduced  development  costs,  incremental  testing  ca¬ 
pabilities,  extensive  error  detection,  and  cost- 
effective  application.  We  are  experimenting  with 
the  use  of  process  programming  as  a  mechanism  for 
integrating  testing  techniques.  Having  set  out  to 
develop  a  process  that  provides  adequate  coverage 
arui  comprehensive  fault  detection,  we  proposed 
synergistic  use  of  Data  Flow  testing  and  Relay  to 
achieve  all  four  goals.  We  developed  a  testing 
process  program  much  as  we  would  develop  a  soft¬ 
ware  product  from  requirements  through  design  to 
implementation  and  evaluation.  We  found  process 
programming  to  be  effective  for  explicitly  integrat¬ 
ing  the  techniques  and  achieving  the  desired  syner¬ 
gism.  Used  in  this  way,  process  programming  also 
mitigates  many  of  the  other  problems  that  plague 
testing  in  the  software  development  process. 

The  paper  requires  a  grounding  in  the  concept  of 
process  programming,  “programming”  the  process 
of  software  development. 

This  paper  is  most  appropriate  for  those  interested 
in  research  into  the  testing  process,  rather  than  test¬ 
ing,  per  se. 
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Rowland81 

Rowland,  John  H.,  and  Philip  J.  i  .*  .  “On  the  Use 

of  Transcendentals  for  Program  Testing.”  J.  ACM 
28. 1  (Jan.  1981),  181-190. 

Abstract:  The  element  z  is  called  a  transcendental 
for  the  class  F  if  functions  in  F  can  be  uniquely 
identified  by  their  values  at  z.  Conditions  for  the 
existence  of  transcendentals  are  discussed  for  cer¬ 
tain  classes  of  polynomials,  multinomials,  and  ra¬ 
tional  functions.  Of  particular  interest  are  those 
transcendentals  having  an  exact  representation  in 
computer  arithmetic.  Algorithms  are  presented  for 
reconstruction  of  the  coefficients  of  a  polynomial 
from  its  value  at  a  transcendental.  The  theory  is 
illustrated  by  application  to  polynomials,  quadratic 
forms,  and  quadrature  formulas. 

This  paper  presents  many  techniques  for  demon¬ 
strating  that  a  particular  function  has  been  imple¬ 
mented  in  a  computer  program.  The  paper  requires 
a  good  background  in  functional  analysis  to  grasp 
all  the  details.  It  is  very  well  written,  though  it  has 
limited  application. 

The  paper  can  prove  useful  to  the  instructor,  espe¬ 
cially  in  gaining  understanding  of  issues  involved  in 
selecting  test  data  for  particular  program  paths.  It  is 
not  recommended  for  students. 

TaiSO 

Tai,  Kuo-Chung.  “Program  Testing  Complexity  and 
Test  Criteria.”  IEEE  Trans.  Software  Eng.  SE-6,  6 
(Nov.  1980).  531-538. 

Abstract:  This  paper  explores  the  testing  com¬ 
plexity  of  several  classes  of  programs,  where  the 
testing  complexity  is  measured  in  terms  of  the  num¬ 
ber  of  test  data  required  for  demonstrating  program 
correctness  by  testing.  It  is  shown  that  even  for 
very  restrictive  classes  of  programs,  none  of  the 
commonly  used  test  criteria,  namely,  having  every 
statement,  branch,  and  path  executed  at  least  once, 
is  nearly  sufficient  to  guarantee  absence  of  errors. 

Based  on  the  study  of  testing  complexity,  this  paper 
proposes  two  new  test  criteria,  one  for  testing  a 
path  and  the  other  for  testing  a  program.  These 
new  criteria  suggest  how  to  select  test  data  to  ob¬ 
tain  confidence  in  program  correctness  beyond  the 
requirement  of  having  each  statement,  branch,  or 
path  tested  at  least  once. 

This  paper  analyzes  the  complexity  of  achieving 
seveial  structural  coverage  measures.  The  inade¬ 
quacy  of  these  measures  is  again  shown,  along  with 
new  criteria  for  demonstrating  correcuiess  for  a 
limited  class  of  programs. 

The  paper  should  be  read  by  the  instructor  to  gain 
an  appreciation  of  when  testing  is  equivalent  to 
proving  correctness.  It  is  in-depth  reding  for  a 
student  interested  in  structural  testing. 


Tal91 

Tai,  Kuo-Chung,  Richard  H.  Carver,  and  Evelyn 

E.  Obaid.  “Debugging  Concurrent  Ada  Programs  by 

Detemainistic  Execution.”  IEEE  Trans.  Software 
Eng.  77,  1  (Jan.  1991),  45-63. 

Abstract:  An  execution  of  a  concurrent  program  P 
with  input  X  nondeterministicatly  exercises  a  se¬ 
quence  of  synchronization  events,  called  a  syn¬ 
chronization  sequence  (or  SYN-sequence).  Tims, 
multiple  executions  ofP  with  the  same  input  X  may 
exercise  different  SYN-sequences  and  produce  dif¬ 
ferent  results.  When  debugging  an  erroneous  ex¬ 
ecution  of  P  with  input  X,  it  is  often  necessary  to 
repeat  this  execution  in  order  to  collect  more  de¬ 
bugging  information.  However,  there  is  no  guar¬ 
antee  that  this  execution  will  be  repeated  by  execut¬ 
ing  P  with  input  X.  To  solve  this  problem  requires 
deterministic  execution  debugging,  which  is  to  force 
a  deterministic  execution  of  a  concurrent  program 
according  to  the  SYN-sequence  of  a  previous  execu¬ 
tion  of  this  program. 

In  this  paper,  we  present  a  language-based  ap¬ 
proach  to  deterministic  execution  debugging  of  con¬ 
current  Ada  programs.  Our  approach  is  to  define 
SYN-sequences  of  a  concurrent  Ada  program  in 
terms  of  Ada  language  constructs  and  to  replay 
such  SYN-sequences  without  the  need  of  system- 
dependent  debugging  tools.  We  first  show  how  to 
define  a  SYN-sequence  of  a  concurrent  Ada  pro¬ 
gram  in  order  to  provide  sufficient  irformation  for 
deterministic  execution.  Then  we  show  how  to 
tranfform  a  concurrent  Ada  program  P  so  that  the 
SYN-sequences  of  previous  executions  of  P  can  be 
replayed.  This  tranfformation  adds  an  Ada  task  to 
P  that  controls  program  execution  by  synchronizing 
with  the  original  tasks  in  P.  We  also  briefly  de¬ 
scribe  the  implementation  of  tools  supporting  deter¬ 
ministic  execution  of  concurrent  Ada  programs. 

This  paper  provides  a  technical  introduction  to  test¬ 
ing  concurrent  Ada  programs  using  a  model  that 
captures  the  sequence  of  concurrent  interaction  and 
enables  it  to  be  replayed.  See  [Carver91]  for  a  less 
technical  introduction  and  [WeissSS]  for  a  theoret¬ 
ical  discussion. 

The  paper  should  be  read  by  those  conducting  re¬ 
search  into  testing  concunent  programs. 

Voas91 

Voas,  Jeffrey,  Larry  J.  Morell,  and  Keith  Miller. 

“Predicting  Where  Faults  Can  Hide  from  Testing.” 

IEEE  Software  8,  2  (March  1991),  41-48. 

This  paper  inffoduces  the  concept  of  sensitivity 
analysis,  which  estimates  the  probability  that  a  pro¬ 
gram  location  can  hide  a  fault  The  paper  is  built  on 
the  fault/failure  model  that  is  used  to  structure  the 
implementation-based  testing  section  of  this  mod- 
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ule.  For  a  program  to  fail  on  a  given  input,  three 
necessary  and  sufficient  conditions  must  be  satis¬ 
fied:  a  fault  location  must  be  executed,  the  succeed¬ 
ing  data  state  must  be  infected,  and  the  data-state 
error  must  propagate  to  the  output.  The  paper  gives 
an  overview  of  execution,  infection,  and  propaga¬ 
tion  analysis  and  discusses  how  the  results  of  these 
analyses  can  be  used  to  identify  program  locations 
where  faults  can  easily  hide. 

This  paper  is  the  best  introduction  to  the  fault/ 
failure  model  discussed  in  this  module.  It  should  be 
read  before  some  of  the  more  theoretical  presen¬ 
tations  in  [MorellSS],  [RichardsonSS],  [ZeilSQ],  and 
[MorellQO]. 

Welser84 

Wciser,  Mark.  “Program  Slicing.”  IEEE  Trans.  Soft¬ 
ware  Eng.  SE-10, 4  (July  1984),  352-357. 

Abstract:  Program  slicing  is  a  method  for  automat¬ 
ically  decomposing  programs  by  analyzing  their 
data  flow  and  control  flow.  Starting  from  a  subset 
of  a  program's  behavior,  slicing  reduces  that  pro¬ 
gram  to  a  minimal  form  which  still  produces  that 
behavior.  The  reduced  program,  called  a  "slice,” 
is  an  independent  program  guaranteed  to  represent 
faithfully  the  original  program  within  the  domain  of 
the  specified  subset  of  behavior. 

Some  properties  of  slices  are  presented.  In  partic¬ 
ular,  finding  statement-minimal  slices  is  in  general 
unsolvable,  but  using  data  flow  analysis  is  student 
to  find  approximate  slices.  Potential  applications 
include  automatic  slicing  tools  for  debugging  and 
parallel  processing  of  slices. 

This  article  underscores  the  point  that  the  same 
analysis  technique — data  flow  in  this  case — can  be 
used  effectively  in  many  areas  of  software  engineer¬ 
ing.  Although  slicing  has  not  been  applied  to  test¬ 
ing,  the  linkage  between  data  flow  testing  and  pro¬ 
gram  slicing  is  inescapable.  [Korel90]  extends  the 
static  concept  of  slicing  to  a  dynamic  one. 

This  paper  provides  a  detailed  discussion  of  the  the¬ 
ory  underlying  program  slicing.  It  requires  careful 
reding  and  significant  background  in  data  flow 
analysis.  This  is  essential  reading  for  the  instructor. 

Weiss88 

Weiss.  Stewart  N.  “A  Formal  Framework  for  the 
Study  of  Concurrent  Program  Testing."  Proc.  Sec¬ 
ond  Workshop  on  Software  Testing,  Verification, 
and  Analysis.  Washington,  D.C.:  IEEE  Computer 
Society  Press,  1988, 106-113. 

Abstract:  Representing  a  concurrent  program  as  a 
set  of  simulating,  sequential  programs  provides  a 
solution  to  the  reproducible  testing  problem  as  well 
as  a  formal  foundation  for  a  theory  of  concurrent 


program  testing.  It  is  shown  how  this  model  of 
concurrent  programs  is  used  to  extend  the  methods 
and  theory  of  te.sting  sequential  programs  to  con¬ 
current  programs. 

This  paper  gives  an  overview  of  Weiss’s  Ph.D.  dis¬ 
sertation  on  the  testing  of  concurrent  programs. 

Only  those  with  a  significant  background  in  concur¬ 
rency  and  program  testing  should  read  this. 

WeyukerSO 

Weyuker,  Elaine  J.,  and  Thomas  J.  Ostrand. 
“Theories  of  Program  Testing  and  the  Application  of 
Revealing  Subdomains.”  IEEE  Trans.  Software  Eng. 
SE-6,  3  (May  19801.  236-246.  Reprinted  in 
[MillerSI]. 

Abstract:  The  theory  of  test  data  selection  pro¬ 
posed  by  Goodenough  and  Gerhart  is  examined.  In 
order  to  extend  and  refine  this  theory,  the  concepts 
of  a  revealing  test  criterion  and  a  revealing  sub- 
domain  are  proposed.  These  notions  are  then  used 
to  provide  a  basis  for  constructing  program  tests. 

A  subset  of  a  program’s  input  domain  is  revealing  if 
the  existence  of  one  incorrectly  processed  input  im¬ 
plies  that  all  of  the  subset's  elements  are  processed 
incorrectly.  The  intent  of  this  notion  is  to  partition 
the  program’s  domain  in  such  a  way  that  all  ele¬ 
ments  of  an  equivalence  class  are  either  processed 
correctly  or  incorrectly.  A  test  set  is  then  formed  by 
choosing  one  element  from  each  class.  This  process 
represents  perfect  program  testing.  For  a  practical 
testing  strategy,  the  domain  is  partitioned  into  sub- 
domains  which  are  revealing  for  errors  considered 
likely  to  occur. 

Three  programs  which  have  previously  appeared  in 
the  literature  are  discussed  and  tested  using  the  no¬ 
tions  developed  in  the  paper. 

This  is  the  foundational  paper  for  error-based  test¬ 
ing.  The  criticism  of  [Goodenough75]  is  crisp,  and 
the  paper’s  theoretical  approach  has  established  it  as 
a  classic. 

This  is  essential  reading  for  the  instructor.  The  stu¬ 
dent  who  wishes  to  pursue  error-based  te.sting  must 
read  it  also. 

Weyuker82 

Weyuker,  Elaine  J.  “On  Testing  Non-testable  Pro¬ 
grams.”  Computer  J.  25, 4  (Nov.  1982),  465-470. 

Abstract:  A  frequently  invoked  assumption  in  pro¬ 
gram  testing  is  that  there  is  an  oracle  (i.e.  the 
tester  or  an  external  mechanism  can  accurately 
decide  whether  or  not  the  output  produced  by  a  pro¬ 
gram  is  correct).  A  program  is  non-testable  if  ei¬ 
ther  an  oracle  does  not  exist  or  the  tester  must  ex¬ 
pend  some  extraordinary  amount  of  time  to  deter- 
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mine  whether  or  not  the  output  is  correct.  The 
reasonableness  of  the  oracle  assumption  is  ex¬ 
amined  and  the  conclusion  is  reached  that  in  many 
cases  this  is  not  a  realistic  assumption.  The  conse¬ 
quences  of  assuming  the  availability  of  an  oracle 
are  examined  and  alternatives  investigated. 

Oracles  may  be  unavailable  for  a  number  of  rea¬ 
sons,  e.g.,  the  correct  output  may  not  be  known  or 
may  be  extremely  difficult  to  compute. 

This  paper  is  essential  reading  for  the  instructor, 
and  it  provides  students  with  a  useful  description  of 
pragmatic  difficulties  of  testing  theory  and  practice. 

Weyuker84 

Weyuker,  Elaine  J.  “The  Complexity  of  Data  Flow 
Criteria  for  Te.st  Data  Selection.”  Information  Pro¬ 
cessing  Letters  19, 2  (Aug.  1984),  103-109. 

This  paper  analyzes  the  theoretical  upper  bound  on 
the  number  of  test  cases  necessary  to  cover  all  the 
definition-use  pairs  of  a  program. 

This  is  expert  reading  in  data  flow  testing. 

Weyuker86 

Weyuker,  Elaine  J.  “Axiomatizing  Software  Test 
Data  Adequacy.”  IEEE  Trans.  Software  Eng.  SE-12, 
12  (Dec.  1986),  1128-1138. 

Abstract:  A  test  data  adequacy  criterion  is  a  set  of 
rules  used  to  determine  whether  or  not  strident 
testing  has  been  performed.  A  general  axiomatic 
theory  of  test  data  adequacy  is  developed,  and  five 
previously  proposed  adequacy  criteria  are  ex¬ 
amined  to  see  which  of  the  axioms  are  satiated.  It 
is  shown  that  the  axioms  are  consistent,  but  that 
only  two  of  the  criteria  satisfy  all  of  the  axioms. 

A  set  of  meta-criteria  (called  axioms)  are  estab¬ 
lished  for  evaluating  test  data  adequacy  criteria. 
Criticism  of  this  article  appears,  along  with  a  reply 
by  Weyuker,  in  [ZwGbGn89]. 

This  article  is  for  researchers  in  program  testing  the¬ 
ory. 

Weyuker88 

Weyuker,  Elaine  J.  “An  Empirical  Study  of  the 
Complexity  of  Data  Flow  Testing.”  Proc.  Second 
Workshop  on  Software  Testing,  Verification,  and 
Analysis.  Washington,  D.C.:  IEEE  Computer  Soci¬ 
ety  F^ss,  1988,  188-195. 

Abstract:  A  family  of  test  data  adequacy  criteria 
employing  data  flow  information  has  been 
previously  proposed,  and  theoretical  complexity 
analysis  performed.  This  paper  describes  an  em¬ 
pirical  study  to  help  determine  the  actual  cost  of 
using  these  criteria.  This  should  help  establish  the 


practical  usefulness  of  these  criteria  in  testing  soft¬ 
ware,  and  serve  as  a  means  of  predicting  the 
amount  of  testing  needed  for  a  given  program. 

The  programs  studied  were  taken  from  the  book 
Software  Tools  in  Pascal  by  Brian  W.  Kemighan 
and  P.  J.  Plauger.  The  study  was  motivated  by  the 
theoretical  work  in  [WGyukGr84],  which  indicated 
that  data  flow  testing  can  require  a  number  of  tests 
exponentially  related  to  the  number  of  statements  in 
the  software.  The  study  found  that  for  most  of  the 
software  tested,  only  linearly  many  tests  were 
needed.  The  impact  of  infeasible  definition-use 
pairs  was  important,  however. 

The  paper  contains  some  interesting  discussion  of 
experimental  design.  The  software  tested  is  readily 
available,  making  this  paper  a  good  starting  point 
for  comparison  experiments. 

White80 

White,  Lee  J.,  and  Edward  I.  Cohen.  “A  Domain 

Strategy  for  Computer  Program  Testing.”  IEEE 

Trans.  Software  Eng.  SE-6,  3  (May  1980),  247-257. 
Reprinted  in  [Miller81]. 

Abstract:  This  paper  presents  a  testing  strategy 
designed  to  detect  errors  in  the  control  flow  of  a 
computer  program,  and  the  conditions  under  which 
this  strategy  is  reliable  are  given  and  characterized. 
The  control  flow  statements  in  a  computer  program 
partition  the  input  .space  into  a  set  of  mutually  ex¬ 
clusive  domains,  each  of  which  corresponds  to  a 
particular  program  path  and  consists  of  input  data 
points  which  cau.se  that  path  to  be  executed.  The 
testing  strategy  generates  test  points  to  examine  the 
boundaries  of  a  domain  to  detect  whether  a  domain 
error  has  occurred,  as  either  one  or  more  of  these 
boundaries  will  have  shifted  or  else  the  correspond- 
ing  predicate  relational  operator  has  changed.  If 
test  points  can  be  chosen  within  e  of  each  boundary, 
under  the  appropriate  assumptions,  the  strategy  is 
shown  to  be  reliable  in  detecting  domain  errors  of 
magnitude  greater  than  e.  Moreover,  the  number  of 
test  points  required  to  test  each  domain  grows  only 
linearly  with  both  the  dimensionality  of  the  input 
space  and  the  number  of  predicates  along  the  path 
being  tested. 

Hiis  is  the  fundamental  paper  on  domain  testing,  an 
error-based  testing  strategy.  The  paper  focuses  on 
testing  errors  in  the  control  flow  of  programs  whose 
predicates  have  linear  interpretation  in  the  input 
variables.  Note  that  the  restrictions  specified  in  the 
paper,  especially  linearity  and  the  absence  of  arrays, 
limit  the  applicability  of  this  strategy  mostly  to  data 
processing  programs.  The  suategy  is  examined 
closely  in  [ClarkG82)  and  complemented  by  the  ap¬ 
proach  in  [ZGil83]. 

This  paper  is  very  well  written  and  requires  little 
background,  though  [HowdGn76]  should  probably  be 
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read  first  It  is  essential  reading  for  the  instructor, 
and  students  will  find  it  very  readable. 

WoodwardSO 

Woodward,  Martin  R.,  David  Hedley,  and  Michael 
A.  Hennell.  “Experience  with  Path  Analysis  and 
Testing  of  Programs.”  IEEE  Trans.  Software  Eng. 
SE-6,  3  (May  1980),  278-286.  Reprinted  in 
[Miller81]. 

Abstract:  There  are  a  number  of  practical  dif¬ 
ficulties  in  performing  a  path  testing  strategy  for 
computer  programs.  One  problem  is  in  deciding 
which  paths,  out  of  a  possible  infinity,  to  use  as  test 
cases.  A  hierarchy  of  structural  test  metrics  is  sug¬ 
gested  to  direct  the  choice  and  to  monitor  the 
coverage  of  test  paths.  Another  problem  is  that 
many  of  the  chosen  paths  may  be  infeasible  in  the 
sense  that  no  test  data  can  ever  execute  them.  Ex¬ 
perience  with  the  use  of  "allegations"  tc  circum¬ 
vent  this  problem  and  prevent  the  static  generation 
of  many  infeasible  paths  is  reported. 

This  paper  introduces  the  concept  of  LCSAJ,  a 
linear  code  sequence  and  jump,  which  has  since 
been  used  as  a  structural  measure  in  several  diverse 
experiments. 

The  paper  should  be  read  by  the  instructor  inter¬ 
ested  in  practical  methods  of  structural  testing.  Stu¬ 
dents  will  find  the  paper  difficult  but  rewarding. 

Woodward88 

Woodward,  M.  R.,  and  K.  Halewood.  “From  Weak 
to  Strong,  Dead  or  Alive?  An  Analysis  of  Some 
Mutation  Testing  Issues.”  Proc.  Second  Workshop 
on  Software  Testing,  Verification,  and  Analysis. 
Washington,  D.C.:  IEEE  Computer  Society  Press, 
1988, 152-158. 

Abstract:  Despite  the  intrinsic  appeal  of  the  muta¬ 
tion  approach  to  testing,  its  disadvantage  in  being 
computationally  expensive  has  hampered  its  wide¬ 
spread  acceptance.  When  weak  mutation  was  intro¬ 
duced  as  a  less  expensive  and  less  stringent  form  of 
mutation  testing,  the  original  technique  was 
renamed  strong  mutation.  This  paper  argues  that 
strong  mutation  testing  and  weak  mutation  testing 
are  in  fact  extreme  ends  of  a  spectrum  of  mutation 
approaches.  The  term  firm  mutation  is  introduced 
to  represent  the  middle  ground  in  this  spectrum. 
The  paper  also  argues,  by  means  of  a  number  of 
small  examples,  that  there  is  a  potential  problem 
concerning  the  criterion  for  deciding  whether  a 
mutant  is  'dead"  or  ‘live.'  A  variety  of  solutions  are 
suggested.  Finally,  practical  considerations  for  a 
firm  mutation  testing  system,  with  greater  user  con¬ 
trol  over  the  nature  of  result  comparison,  are  dis¬ 
cussed.  Such  a  system  is  currently  under  devel¬ 
opment  as  part  of  an  interpretive  development  envi¬ 
ronment. 


Firm  mutation  represents  the  practical  implemen¬ 
tation  of  the  extent  property  of  fault-based  tech¬ 
niques  discussed  in  [Morelissj.  This  paper  is  indica¬ 
tive  of  a  growing  understanding  of  the  importance 
of  analysis  of  propagation  in  program  testing. 

The  paper  assumes  a  good  grounding  in  mutation 
testing  and  knowledge  of  practical  problems  associ¬ 
ated  with  it. 

Young88 

Young,  Michal,  and  Richard  N.  Taylor.  “Combining 
Static  Concurrency  Analysis  with  Symbolic  Execu¬ 
tion.”  IEEE  Trans.  Software  Eng.  14,  10  (Oct. 
1988),  1499-1511. 

Abstract:  Static  concurrency  analysis  detects 

anomalous  synchronization  patterns  in  concurrent 
programs,  but  may  also  report  spurious  errors  in¬ 
volving  infeasible  execution  paths.  Integrated  ap¬ 
plication  of  static,  concurrency  analysis  and  sym¬ 
bolic  execution  sharpens  the  results  of  the  former 
without  incurring  the  full  costs  of  the  latter  applied 
in  isolation.  Concurrency  analysis  acts  as  a  path 
selection  mechanism  for  symbolic  execution,  while 
symbolic  execution  acts  as  a  pruning  mechanism  for 
concurrency  analysis.  Methods  for  combining  the 
techniques  follow  naturally  from  explicit  charac¬ 
terization  and  comparison  of  the  state  spaces  ex¬ 
plored  by  each,  suggesting  a  general  approach  for 
integrating  state-based  program  analysis  tech¬ 
niques  in  a  software  development  environment. 

Many  have  proposed  augmenting  flow  analysis  with 
symbolic  execution  to  minimize  the  impact  of  in¬ 
feasible  paths.  This  paper  clearly  presents  the  ad¬ 
vantages  and  the  difficulties  of  such  an  integration. 

Since  this  paper  treats  the  intersection  of  three  sub¬ 
jects — concurrency,  flow  analysis,  and  symbolic 
execution) — significant  background  is  necessary  be¬ 
fore  reading. 

Youngblut89 

Youngblut,  Christine,  et  al.  SDS  Software  Testing 
and  Evaluation:  A  Review  of  the  State-of-the-Art  in 
Software  Testing  and  Evaluation  with  Recommended 
R&D  Tasks.  IDA  Paper  P-2132,  Institute  for  De¬ 
fense  Analyses,  Alexandria,  Va.,  Feb.  1989. 

This  report  discusses  almost  all  areas  of  program 
analysis  and  testing  at  all  levels  (unit,  integration, 
system,  and  acceptance)  and  evaluates  them  in  the 
context  of  SDI  applications.  (The  report  was  pre¬ 
pared  for  the  Strategic  Defense  Initiative  Organiza¬ 
tion.)  An  extensive  glossary  is  included.  This  is  a 
companion  to  [Brykczynski89). 
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Zeil83 

Zeil,  Steven  J.  ‘Testing  for  Perturbations  of  Program 
Statements.”  IEEE  Trans.  Software  Eng.  SE-9,  3 
(May  1983),  335-346. 

Abstract:  Many  testing  methods  require  the  selec¬ 
tion  of  a  set  of  paths  on  which  tests  are  to  be  con¬ 
ducted.  Errors  in  arithmetic  expressions  within 
program  statements  can  be  represented  as  perturb¬ 
ing  functions  added  to  the  correct  expression.  It  is 
then  possible  to  derive  the  set  of  errors  in  a  chosen 
functional  class  which  cannot  possibly  be  detected 
using  a  given  test  path.  For  example,  test  paths 
which  pass  through  an  assignment  statement  "X  ;= 
f  (Y)”  are  incapable  of  revealing  if  the  expression 
"X  -  f  (Y)”  has  been  added  to  later  statements.  In 
general,  there  are  an  infinite  number  of  such  un¬ 
detectable  error  perturbations  for  any  test  path. 
However,  when  the  chosen  functional  class  of  error 
expressions  is  a  vector  space,  a  finite  characteriza¬ 
tion  of  all  undetectable  expressions  can  be  found  for 
one  test  path,  or  for  combined  testing  along  several 
paths.  An  analysis  of  the  undetectable  pertur¬ 
bations  for  sequential  programs  operating  on  in¬ 
tegers  and  real  numbers  is  presented  which  permits 
the  detection  of  multinomial  error  terms.  The  re¬ 
duction  of  the  space  of  (potential)  undetected  errors 
is  proposed  as  a  criterion  for  test  path  selection. 

This  paper  describes  a  method  for  deducing  suf¬ 
ficient  path  coverage  to  ensure  the  absence  of 
present)^  errors  in  a  program.  It  models  the  pro¬ 
gram  computation  and  potential  errors  as  a  vector 
space.  This  enables  the  conditions  for  non¬ 
detection  of  an  error  to  be  calculated.  The  strategy 
assumes  the  existence  of  a  reliable  testing  strategy 
for  paths,  which,  of  course,  does  not  exist. 

Exposure  to  [WhitaSO]  should  provide  sufficient 
background  for  appreciating  the  context  in  which 
the  techniques  are  to  be  used.  To  understand  the 
mathematics  requires  some  background  in  linear  al¬ 
gebra,  especially  if  some  of  the  references  are  to  be 
read.  The  paper  explores  an  interesting  area  and 
deserves  to  be  read  by  the  instructor.  This  is  ad¬ 
vanced  reading  for  students,  however. 

Zeil88 

Zeil,  Steven  J.  “Selectivity  of  Data-Flow  and 
Control-Flow  Path  Criteria.”  Proc.  Second  Work¬ 
shop  on  Software  Testing,  Verification,  and  Analy¬ 
sis.  Washington,  D.C.:  IEEE  Computer  Society 
Press,  1988, 216-222. 

Abstract:  A  given  path  selection  criterion  is  more 
selective  than  another  such  criterion  with  respect  to 
some  testing  goal  if  it  never  requires  more,  and 
sometimes  requires  fewer,  test  paths  to  achieve  that 
goal.  This  paper  presents  canonical  forms  of 
control-flow  and  data-flow  path  selection  criteria 
and  demonstrates  that,  for  some  simple  testing 


goals,  the  data-flow  criteria  as  a  general  class  are 
more  selective  than  the  control-flow  criteria.  It  is 
shown,  however,  that  this  result  does  not  hold  for 
general  testing  goals,  a  limitation  that  appears  to 
stem  directly  from  the  practice  of  defining  data-flow 
criteria  on  the  computation  history  contributing  to  a 
single  result. 

This  paper  challenges  the  reader  to  consider  meth¬ 
ods  of  comparing  path-selection  criteria  other  than 
subsumption. 

The  paper  assumes  the  reader  is  familiar  with  data 
flow  testing  at  the  level  of  [Clarke89]. 

Zeil89 

Zeil,  Steven  J.  “Perturbation  Techniques  for  Detect¬ 
ing  Domain  Errors.”  IEEE  Trans.  Software  Eng.  15. 
6  (June  1989),  737-746. 

Abstract:  Perturbation  testing  is  an  approach  to 
software  testing  which  focuses  on  faults  within 
arithmetic  expressions  appearing  throughout  a  pro¬ 
gram.  In  this  paper  perturbation  testing  is  ex¬ 
panded  to  permit  analysis  of  individual  test  points 
rather  than  entire  paths,  and  to  concentrate  on 
domain  errors.  Faults  are  modeled  as  perturbing 
functions  drawn  from  a  vector  space  of  potential 
faults  and  added  to  the  correct  form  of  an  arith¬ 
metic  expression.  Sensitivity  measures  are  derived 
which  limit  the  possible  size  of  those  faults  that 
would  go  undetected  after  the  execution  of  a  given 
test  set.  These  measures  open  up  an  interesting  new 
view  of  testing,  in  which  attempts  are  :'.ade  to 
reduce  the  volume  of  possible  faults  which,  were 
they  present  in  the  program  being  tested,  would 
have  escaped  detection  on  all  tests  performed  so 
far.  The  combination  of  these  measures  with  stan¬ 
dard  optimization  techniques  yields  a  new  test  data 
generation  method,  called  arithmetic  fault  detec¬ 
tion. 

This  paper  extends  Tail’s  earlier  paper  [Zeil83]  by 
treating  aspects  of  propagation  not  considered  be¬ 
fore.  Program  computations  are  modeled  as  vector 
spaces,  and  program  faults  as  perturbations  of  those 
vector  spaces.  The  perturbation  model  is  somewhat 
analogous  to  the  symbolic  execution  model  dis¬ 
cussed  in  [Morell90]. 

The  paper  requires  significant  mathematical  sophis¬ 
tication  to  understand  the  model  proposed.  It  is 
expert  reading  in  error-based  testing. 

Zweben89 

Zweben,  Stuart  H.,  and  John  S.  Gourlay.  “On  the 
Adequacy  of  Weyuker’s  Test  Data  Adequacy  Axi¬ 
oms.”  IEEE  Trans.  Software  Eng.  15,  4  (April 
1989),  496-501. 

Abstract:  Weyuker  has  recently  proposed  a  set  of 
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properties  which  should  be  satiated  by  any  reason¬ 
able  criterion  used  to  claim  that  a  computer  pro¬ 
gram  has  been  adequately  tested.  She  called  these 
properties  "axioms.’’  She  also  evaluated  several 
well-known  testing  strategies  with  respect  to  these 
properties,  and  concluded  that  some  of  the  com¬ 
monly  used  strategies  failed  to  satisfy  several  of  the 
properties. 

We  question  both  the  furuiamental  nature  of  the 
properties  and  the  precision  with  which  they  are 
presented,  and  illustrate  how  a  number  of  ideas  in 
Weyuker's  paper  can  be  simplified  and  clarified 
through  greater  precision  and  a  more  consistent  set 
of  definitions.  We  also  reanalyze  the  testing  strat¬ 
egies  after  accounting  for  these  inconsistencies. 
The  strategies  tend  to  fare  much  better  as  a  result  of 
this  reanalysis. 

The  authors  raise  the  issue  of  what  makes  an 
axiomatic  system,  as  well  as  what  constitutes  a 
proper  axiom.  This  criticism  must  be  read  with 
along  with  [Weyuker86].  Weyuker  responds  to  the 
criticism  at  the  end  of  the  article. 

If  students  have  never  seen  such  a  professional  in¬ 
terchange,  this  is  worth  reading  for  that  aspect 
alone. 
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